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Series Foreword 


Theoretical computer science has now undergone several decades of develop- 
ment. The “classical” topics of automata theory, formal languages, and computa- 
tional complexity have become firmly established, and their importance to other 
theoretical work and to practice is widely recognized. Stimulated by technological 
advances, theoreticians have been rapidly expanding the areas under study, and the 
time delay between theoretical progress and its practical impact has been decreas- 
ing dramatically. Much publicity has been given recently to breakthroughs in 
cryptography and linear programming, and steady progress is being made on pro- 
gramming language semantics, computational geometry, and efficient data struc- 
tures. Newer, more speculative, areas of study include relational databases, VLSI 
theory, and parallel and distributed computation. As this list of topics continues 
expanding, it is becoming more and more difficult to stay abreast of the progress 
that is being made and increasingly important that the most significant work be 
distilled and communicated in a manner that will facilitate further research and 
application of this work. 


By publishing comprehensive books and specialized monographs on the 
theoretical aspects of computer science, the series on Foundations of Computing 
provides a forum in which important research topics can be presented in their 
entirety and placed in perspective for researchers, students, and practitioners alike. 
This volume, by Michael J. O’Donnell, presents an elegant and powerful interpre- 
tive system for programming in terms of abstract logical equations. The language 
is similar to Prolog, in that it is descriptive rather than procedural, but unlike Pro- 
log its semantic description allows an efficient implementation that strictly adheres 
to the given semantics. The presentation provides the definition of the language, 
many examples of its use, and discussion of the relevant underlying theory. It is 
essential reading for anyone interested in the latest ideas about nonprocedural pro- 
gramming and practical programming language semantics. 


Michael R. Garey 


Preface 


This book describes an ongoing equational programming project that started in 
1975. Principal investigators on the project are Christoph Hoffmann and Michael 
O'Donnell. Paul Chew, Paul Golick, Giovanni Sacco, and Robert Strandh partici- 
pated as graduate students. I am responsible for the presentation at hand, and the 
opinions expressed in it, but different portions of the work described involve each of 
the people listed above. I use the pronoun "we" throughout the remainder, to indi- 
cate unspecified subsets of that group. Specific contributions that can be attributed 
to one individual are acknowledged by name, but much of the quality of the work 
is due to untraceable interactions between several people, and should be credited to 
the group. 

The equational programming project never had a definite pseudocommercial 
goal, although we always hoped to find genuinely useful applications. Rather than 
seeking a style of computing to support a particular application, we took a clean, 
simple, and elegant style of computing, with particularly elementary semantics, and 
asked what it is good for. As a result, we adhered very strictly to the original con- 
cept of computing with equations, even when certain extensions had obvious prag- 
matic value. On the other hand, we were quite willing to change the application. 
Originally, we envisioned equations as formal descriptions of interpreters for other 
programming languages. When we discovered that such applications led to outra- 
geous overhead, but that programs defined directly by equations ran quite competi- 
tively with LISP, we switched application from interpreter generation to program- 


ming with equations. 


We do not apologize for our fanaticism about the foundations of equational 


programming, and our cavalier attitude toward applications. We believe that good 
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mathematics is useful, but not always for the reasons that motivated its creation 
(non-Euclidean geometry is a positive example, the calculus a negative one). Also, 
while recognizing the need for programming languages that support important 
applications immediately, we believe that scientific progress in the principles of pro- 
gramming and programming languages is impeded by too quick a reach for appli- 
cations. The utility of LISP, for example, is unquestionable, but the very adjust- 
ments to LISP that give it success in many applications make it a very imprecise 
vehicle for understanding the utility of declarative programming. We would rather 
discover that pure equational programming, as we envision it, is unsuitable for a 
particular application, than to expand the concept in a way that makes it harder to 


trace the conceptual underpinnings of its success or failure. 


Without committing to any particular type of application, we must experiment 
with a variety of applications, else our approach to programming is pure specula- 
tion. For this purpose, we need an implementation. The implementation must per- 
form well enough that some people can be persuaded to use it. We interpret this 
constraint to mean that it must compete in speed with LISP. Parsers, program- 
ming support, and the other baggage possessed by all programming languages, 
must be good enough not to get in the way, but the main effort should go toward 
demonstrating the feasibility of the novel aspects, rather than solving well under- 


stood problems once again. 


The equational programming project has achieved an implementation of an 
interpreter for equational programs. The implementation runs under Berkeley 
UNIX* 4.1 and 4.2, and is available from the author for experimental use. The 
current distribution is not well enough supported to qualify as a reliable tool for 


*UNIX is a trademark of AT&T. 
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important applications, but we have hopes of producing such a stronger implemen- 
tation in the next few years. Sections 1 through 10 constitute a user’s manual for 
the current implementation. The remainder of the text covers a variety of topics 
relating to the theory supporting equational programming, the algorithmic and 
organizational problems solved in its implementation, and the special characteris- 
tics of equational programming that qualify it for particular applications. Some 
sections discuss work in progress. The intent is to give a solid intuition for all the 
identifiable aspects of the project, from its esoteric theoretical foundations in logic 
to its concrete implementation as a system of programs, and its potential applica- 


tions. 


Various portions of the work were supported by a Purdue University XL 
grant, by the National Science Foundation under grants MCS-7801812 and MCS- 
8217996, and by the National Security Agency under grant 84H0006. The Purdue 
University Department of Computer Sciences provided essential computing 
resources for most of the implementation effort. I am grateful to Robert Strandh 
and Christoph Hoffmann for critical readings of the manuscript, and to AT&T 
Bell Laboratories for providing phototypesetting facilities. Typesetting was accom- 


plished using the troff program under UNIX. 


EQUATIONAL LOGIC 


asa 


PROGRAMMING LANGUAGE 


1. Introduction (adapted from HO82b) 


Computer scientists have spent a large amount of research effort developing the 
semantics of programming languages. Although we understand how to implement 
Algol-style procedural programming languages efficiently, it seems to be very 
difficult to say what the programs mean. The problem may come from choosing an 
implementation of a language before giving the semantics that define correctness of 
the implementation. In the development of the equation interpreter, we reversed 
the process by taking clean, simple, intuitive semantics, and then looking for 


correct, efficient implementations. 


We suggest the following scenario as a good setting for the intuitive semantic 
of computation. Our scenario covers many, but not all, applications of computing 


(e.g., real-time applications are not included). 


A person is communicating with a machine. The person gives a sequence of 
assertions followed by a question. The machine responds with an answer or by 


never answering. 


The problem of semantics is to define, in a rigorous and understandable way, what 
it means for the machine’s response to be correct. A natural informal definition of 
correctness is that any answer that the machine gives must be a logical conse- 
quence of the person’s assertions, and that failure to give an answer must mean 
that there is no answer that follows logically from the assertions. If the language 
for giving assertions is capable of describing all the computable functions, the 
undecidability of the halting problem prevents the machine from always detecting 
those cases where there is no answer. In such cases, the machine never halts. The 
style of semantics based on logical consequence leads most naturally to a style of 


programming similar to that in the descriptive or applicative languages such as 
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LISP, Lucid, Prolog, Hope, OBJ, SASL and Functional Programming languages, 
although Algol-style programming may also be supported in such a way. Compu- 
tations under logical-consequence semantics roughly correspond to "lazy evaluation" 


of LISP [HM76, FW76]. 


Semantics based on logical consequence is much simpler than many other 
styles of programming language semantics. In particular, the understanding of 
" Jogical-consequence semantics does not require construction of particular models 
through lattice theory or category theory, as do the semantic treatments based on 
the work of Scott and Strachey or those in the abstract-data-types literature using 
initial or final algebras. If a program is given as a set of assertions, then the logi- 
cal consequences of the program are merely all those additional assertions that 
must be true whenever the assertions of the program are true. More precisely, an 
equation A=B is a logical consequence of a set E of equations if and only if, in 
every algebraic interpretation for which every equation in E is true, A=B is also 
true (see [(0’D77] Chapter 2 and Section 14 of this text for a more technical treat- 
ment). There is no way to determine which one of the many models of the pro- 
gram assertions was really intended by the programmer: we simply compute for 
him all the information we possibly can from what we are given. For those who 
prefer to think of a single model, term algebras or initial algebras may be used to 
construct one model for which the true equations are precisely the logical conse- 


quences of a given set of equations. 


We use the language of equational logic to write the assertions of a program. 
Other logical languages are available, such as the first-order predicate calculus, 
used in Prolog [Ko79a]. We have chosen to emphasize the reconciliation of strict 


adherence to logical consequences with good run-time performance, at the expense 
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of generality of the language. Current implementations of Prolog do not always 
discover all of the logical consequences of a program, and may waste much time 
searching through irrelevant derivations. With our language of equations, we lose 
some of the expressive power of Prolog, but we always discover all of the logical 
consequences of a program, and avoid searching irrelevant ones except in cases that 
inherently require parallel computation. Hoffmann and O’Donnell survey the 
issues involved in computing with equations in [HO82b]. Section 17 discusses the 


question of relevant vs. irrelevant consequences of equations more specifically. 
Specializing our computing scenario to equational languages: 


The person gives a sequence of equations followed by a question, "What is E?" 
for some expression E. The machine responds with an equation "E=F," where 


F is a simple expression. 


For our equation interpreter, the "simple expressions" above must be the normal 
forms: expressions containing no instance of a left-hand side of an equation. This 
assumption allows the equations to be used as rewriting rules, directing the replace- 
ment of instances of left-hand sides by the corresponding right-hand sides. Sec- 
tions 2 and 3 explain how to use the equation interpreter to act out the scenario 
above. Our equational computing scenario is a special case of a similar scenario 
developed independently by the philosophers Belnap and Steel for a logic of ques- 


tions and answers [BS76]. 


The equation interpreter accepts equations as input, and automatically pro- 
duces a program to perform the computations described by the equations. In order 
to achieve reasonable efficiency, we impose some fairly liberal restrictions on the 
form of equations given. Section 5 describes these restrictions, and Sections 6-8 


and 10 present features of the interpreter. Section 15 describes the computational 
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power of the interpreter in terms of the procedural concepts of parallelism, non- 
determinism, and pipelining. 
Typical applications for which the equation interpreter should be useful are: 


1. We may write quick and easy programs for the sorts of arithmetic and list- 
manipulating functions that are commonly programmed in languages such as 
LISP. The "lazy evaluation" implied by logical-consequence semantics allows 
us to describe infinite objects in such a program, as long as only finite portions 
are actually used in the output. The advantages of this capability, discussed 
in (FW76, HM76], are similar to the advantages of pipelining between corou- 
tines in a procedural language. Definitions of large or infinite objects may 
also be used to implement a kind of automatic dynamic programming (see 


Section 15.4). 


2. We may define programming languages by equations, and the equation proces- 
sor will produce interpreters. Thus, we may experiment with the design of a 
programming language before investing the substantial effort required to pro- 


duce a compiler or even a hand-coded interpreter. 


3. Equations describing abstract data types may be used to produce correct 
implementations automatically, as suggested by [cs78, Wa76], and imple- 
mented independently in the OBJ language [FGJM85]. 

4. Theorems of the form A=B may sometimes be proved by receiving the same 
answer to the questions "What is A?" and "What is B?" [KB70, HO88] dis- 
cuss such theorem provers. REVE [Le83, FG84] is a system for developing 


theorem-proving applications of equations. 
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5. Non-context-free syntactic checking, and semantics, such as compiler code- 
generation, may be described formally by equations and used, along with the 
conventional formal parsers, to automatically produce compilers (see Section 


13). 


The equation interpreter is intended for use by two different classes of user, in 
somewhat different styles. The first sort of user is interested in computing results 
for direct human consumption, using well-established facilities. This sort of user 
should stay fairly close to the paradigm presented in Section 2, should take the 
syntactic descriptions as fixed descriptions of a programming language, and should 
skip Section 20, as well as other sections that do not relate to the problem at hand. 
The second sort of user is building a new computing product, that will itself be 
used directly or indirectly to produce humanly readable results. This sort of user 
will almost certainly need to modify or redesign some of the syntactic processors, 
and will need to read Sections 13 and 20 rather closely in order to understand how 
to combine equationally-produced interpreters with other sorts of programs. The 
second sort of user is encouraged to think of the equation interpreter as a tool, 
analogous to a formal parser constructor, for building whichever parts of his pro- 
duct are conveniently described by equations. These equational programs may then 
be combined with programs produced by other language processors to perform 
those tasks not conveniently implemented by equations. The aim in using equations 
should be to achieve the same sort of self-documentation and ease of modification 
that may be achieved by formal grammars, in solving problems where context-free 


manipulations are not sufficiently powerful. 


2. Using the Equation Interpreter Under UNIX (ep and ei) 


Use of the equation interpreter involves two separate steps: preprocessing and 
interpreting. The preprocessing step, like a programming language compiler, 
analyzes the given equations and produces machine code. The interpreting step, 
which may be run any number of times once preprocessing is done, reduces a given 
_term to normal form. 

Normal use of the equation interpreter requires the user to create a directory 


containing 4 files used by the interpreter. The 4 files to be created are: 
1. definitions - containing the equations; 

2. pre.in - an input parser for the preprocessor; 

3. int.in - an input parser for the interpreter; 

4. int.out - an output pretty-printer for the interpreter. 


The file definitions, discussed in Section 3, is usually typed in literally by the user. 
The files pre.in, int.in and int.out, which must be executable, are usually produced 


automatically by the command loadsyntax, as discussed in Section 4. 


To invoke the preprocessor, type the following command to the shell 
ep Equnsdir 


where Equnsdir is the directory in which you have created the 4 files above. If no 
directory is given, the current directory is used. Ep will use Equnsdir as the home 
for several temporary files, and produce in Equnsdir an executable file named 
interpreter. Because of the creation and removal of temporary files, the user should 
avoid placing any extraneous files in Equnsdir. Two of the files produced by ep 
are not removed: def.deep and def.in. These files are not strictly necessary for 


operation of the interpreter, and may be removed in the interest of space 
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conservation, but they are useful in building up complex definitions from simpler 
ones (Section 14) and in producing certain diagnostic output (Section 10). To 


invoke the interpreter, type the command: 


ei Equnsdir 


A term found on standard input will be reduced, and its normal form placed on the 


standard output. 


A paradigmatic session with the equation interpreter has the following form: 


mkdir Equnsdir 

loadsyntax Equnsdir 

edit Equnsdir/definitions using your favorite editor 

ep Equnsdir 

edit input using your favorite editor 

ei Equnsdir <input 
The sophisticated user of UNIX may invoke ei from his favorite interactive editor, 
such as ned or emacs, in order to be able to simultaneously manipulate the input 


and output. 


In more advanced applications, if several equation interpreters are run in a 
pipeline, repeated invocation of the syntactic processors may be avoided by invok- 
ing the interpreters directly, instead of using ei. For example, if Equ.1, Equ.2, 
Equ.3 are all directories in which equational interpreters have been compiled, the 


following command pipes standard input through all three interpreters: 


Equ.lfint.in | Equ.1finterpreter | Equ.2sinterpreter | 
Equ.3finterpreter | Equ.3/int.out; 


Use of ei for the same purpose would involve 4 extra invocations of syntactic pro- 
cessors, introducing wasted computation and, worse, the possibility that superficial 


aspects of the syntax, such as quoting conventions, may affect the results. If 
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Equ.1, Equ.2, and Equ.3 are not all produced using the same syntax, careful con- 
sideration of the relationship between the different syntaxes will be needed to make 
sense of such a pipe. 


After specifying the directory containing definitions, the user may give the size 


2!5_-1=32767: 


of the workspace to be used in the interpreter. This size defaults to 
the largest that can be addressed in one 16-bit word with a sign bit. The 
, workspace size limits the size of the largest expression occurring as an intermediate 
step in any reduction of an input to normal form. The effect of the limit is blurred 
somewhat by sharing of equivalent subexpressions, and by allocation of space for 


declared symbols even when they do not actually take part in a particular computa- 


tion. For example, to reduce the interpreter workspace to half of the default, type 


ep Equnsdir 16384 


The largest workspace usable in the current implementation is 
2319147483646. The limiting factor is the Berkeley Pascal compiler, which 
will not process a constant bigger than 231_1=2147483647, and which produces 
mysteriously incorrect assembly code for an allocation of exactly that much. On 
current VAX Unix implementations, the shell may often refuse to run sizes much 


larger than the default because of insufficient main memory. In such a case, the 


user will see a message from the shell saying "not enough core" or “too big". 


3. Presenting Equations to the Equation Interpreter 


Input to the equation interpreter, stored in the file definitions, must be of the fol- 
lowing form: 
Symbols 


symbol_descriptor; 
symbol_descriptor; 


symbol_descriptor,,. 
For all variable,, variable, --- variable,: 


equation; 
equation; 


equationy. 


The principal keywords recognized by the preprocessor are Symbols, For all, and 
Equations, appearing at the beginning of a line. Equations is an alternative to 
For all used in the unusual case that there are no variables in the equations. Cap- 
italization of these keywords is optional, and any number of blanks greater than 0 
may appear between For and all. The only other standard keywords are include, 
where, end where, is, are, in, either, or, and end or. The special symbols used by 
the preprocessor are ":", ";", ".",",", "", and "". Particular term syntaxes (see Sec- 
tion 4) may entail other keywords and special symbols. Blanks are required only 
where necessary to separate alphanumeric strings. Any line beginning with ""is a 


comment, with no impact on the meaning of a specification. 


symbol_descriptors indicate one or more symbols in the language to be 
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defined, and give their arities. Intuitively, symbols of arity 0 are the constants of 


the language, and symbols of higher arity are the operators. A symbol_descriptor 


is either of the form 
symbol, symbol, ... symbol,,: arity m21 
or of the form 


include symbol class, ... symbol_class, n2\ 


Syntactically, symbols and symbol_classes are identifiers: strings other than key- 
words beginning with an alphabetic symbol followed by any combination of alpha- 
betic symbols, base-ten digits, "_", and "-". Identifiers are currently limited to 20 
characters, a restriction which will be removed in future versions. A symbol_class 
indicates the inclusion of a large predefined class of symbols. These classes are dis- 
cussed in Section 6. Symbols that have been explicitly declared in the Symbols 


section are called literal symbols, to distinguish them from members of the 


predefined classes. 


variables are identifiers, of the same sort as symbols. An equation is either of 


the form 

term, = term 
of the form 

term, = term, where qualification 
or of the form 


include equation _class, ++ ,equation_class,, 
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The syntax of terms is somewhat flexible, and is discussed in Section 4. 
qualifications are syntactic constraints on substitutions for variables, and are dis- 
cussed in Section 8. equation_classes are identifiers indicating the inclusion of a 


large number of predefined equations. These classes are discussed in Section 7. 


For larger problems, the notation presented in this section will surely not be 
satisfactory, because it provides no formal mechanism for giving structure to a 
large definition. Section 14 describes a set of operators that may be applied to one 
or more equational definitions to produce useful extensions, modifications, and com- 
binations of the definitions. The idea for these definition-constructing operators 
comes from work on abstract data types by Burstall and Goguen, implemented in 
the language OBJ [BG77]. Users are strongly encouraged to start using these 
operators as soon as a definition begins to be annoyingly large. The current version 
does not implement operators on definitions, so most users will not want to attack 


large problems until a more advanced version is released. 


The syntax presented in this section is collected in the BNF below. 
<program> ::= Symbols <symbol descriptor list>. 
For all <variable list >:<equation list>. 
<symbol descriptor list > ::= <symbol descriptor >;...3 <symbol descriptor > 


<symbol descriptor > ::= <symbol list>:<arity> | 


include <symbol class list> 
<symbol class list> ::= <symbol class>,..., <symbol class> 


<symbol class > ::= atomic_symbols | integer_numerals | truth_yalues 
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<symbol list> = <symbol>,..., <symbol> 
<symbol> ::= <identifier > 

<arity > = <number> 

<variable list> = <variable>,..., <variable> 

_ <variable> ::= <identifier > 

<equation list > = <equation>;... 3 <equation> 


<equation> = <term> = <term> | 


<term> = <term> where <qualification> end where | 


include <equation class list> 
<qualification> ::= <qualification item list > 
<qualification item list> s:= <qualification item>,... ,<qualification item > 


<qualification item> ::= <variable> is <qualified term> | 


<variable list> are <qualified term> 


<qualified term> ::= in<symbol class> | 
<term> | 
<qualified term> where <qualification> end where | 


either <qualified term list> end or 


<qualified term list> := <qualified term>or ... or <qualified term> 


4. The Syntax of Terms (Joadsyntax) 


Since no single syntax for terms is acceptable for all applications of the equation 
interpreter, we provide a library of syntaxes from which the user may choose the 
one best suited to his application. The more sophisticated user, who wishes to 
custom-build his own syntax, should see Section 20 on implementation to learn the 


requirements for parsers and pretty-printers. 
To choose a syntax from the current library, type the command 


loadsyntax Equnsdir Syntax 


where Equnsdir is the directory containing the preprocessor input, and Syntax is 
the name of the syntax to be seen by the user. Loadsyntax will create the 
appropriate pre.in, int.in, and int.out files in Equnsdir to process the selected syn- 
tax. If syntax is omitted, LISP.M is used by default. If Equnsdir is also omitted, 


the current directory is used. 


In order to distinguish atomic symbols from nullary literal symbols in input to 
the interpreter, the literal symbols must be written with an empty argument list. 
Thus, in Standmath notation, a(Q) is a literal symbol, and a is an atomic symbol. 
In LISP.M, the corresponding notations are a[] and a. This regrettable notational 


clumsiness should disappear in later versions. 


4.1 Standmath: Standard Mathematical Notation 


The Standmath syntax is standard mathematical functional prefix notation, with 
arguments surrounded by parentheses and separated by commas, such as 
S (g(a,b),c,h(e)). Empty argument lists are allowed, as in fQ. This syntax is 
used as the standard of reference (but is not the default choice), all others are 


described as special notations for Standmath terms. 
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4.2 LISP.M: Extended LISP Notation 
LISP.M is a liberal LISP notation, which mixes M-expression notation freely with 
S-expressions [McC60]. Invocation of LISP.M requires declaration of the nullary 
symbol nil and the binary symbol cons. An M-expression accepted by LISP.M 
may be in any of the following forms: 

atomic_symbol 

nilO 

(M-expr,; M-expr, +++ M-expr,,) m20 

(M—expr, M-expr, +++ M-expr,_,. M-expr,) n>1 

function(M—expr,; M—expr,;:+*M —expr,] p20 


(M-expr, +++ M—expr,_, . M-expr,) 

is a notation for 

cons(M—expr,, +++ cons(M—expr,_), M—expr,) -*- ) 

(M-expr, +++ M-expr,,) 

is a notation for 

cons (M—expr,, cons(M—expr>, +++ cons(M—expr,, nilQ) --- )) 
function|M—expr\; +++ M—expry] 


is a notation for 


function(M—expr,, --- M-expr,) 
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4.3 Lambda: A Lambda Calculus Notation 


Lambda notation is intended for use in experiments with evaluation strategies for 
the lambda calculus. This notation supports the most common abbreviations con- 
veniently, while allowing unusual sorts of expressions to be described at the cost of 
less convenient notation. Because of the highly experimental nature of this syntax, 
less attention has been given to providing useful error messages. Since this lambda 
notation was developed to support one particular series of experiments with reduc- 


tion strategies, it will probably not be suitable for all uses of the lambda calculus. 


\x.E 


is a notation for 


Lambda(cons(x, nil0), E) 


where x must be an atomic symbol representing a variable. 


(E F) 


is a notation for 


AP(E, F) 


In principle, the notations above are sufficient for describing arbitrary lambda 
terms, but for convenience, multiple left-associated applications may be given with 


only one parenthesis pair. Thus, 


(E, E,E; “29 E,,) n22 
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is a notation for 


AP( +++ AP(AP(E,, Ey), Ey), *** En) 


Similarly, many variables may be lambda bound by a single use of "\". Thus, 


\x NX. °°° x, E n21 


is a notation for 


Lambda (cons (x), cons (x2, +++ nilQ) +++ ),E) 


Notice that the list of variables is given as a LISP list, rather than the more con- 


ventional representation as 


Lambda(x,, Lambda(x», +++ Lambda(x,,E) +++ )) 


It is easy to write equations to translate the listed-variable form into the more con- 
ventional representation, but the listed form allows reduction strategies to take 
advantage of nested Lambdas. In order to write equations manipulating lists of 


variables, it is necessary to refer to a list of unknown length. So, 


\xy XQ °¢* x,1rem.E n20 


is a notation for 


Lambda (cons (x, cons (x, +++ rem) ++: ), E) 


That is, rem above represents the remainder of the list beyond x; ---x,. In the 


special case where n=0, 
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\:list.E 


is a notation for 


Lambda(list, E) 


In order to deal with special internal forms, such as de Bruijn notation [deB72], 


the form 


\iE 


is allowed as a notation for 


Lambda(i, E) 


where i is an integer numeral. If function symbols other than Lambda and AP 
must be introduced, a bracketed style of function application may be used, in 


which 


flEy o> Eyl n>0 


is a notation for 


S (Ej, SA E,) 


4.4 Inner Syntaxes (for the advanced user with a large problem) 


Independently of the surface syntax in which terms are written, it may be helpful 
to use different internal representations of terms for different purposes. For exam- 


ple, instead of having a number of function symbols of different arities, it is some- 


18 4. Syntax of Terms 


times convenient to use only one binary symbol, AP, representing function applica- 
tion, and to represent all other functions by nullary symbols. Application of a 
function to multiple arguments is represented by a sequence of separate applica- 
tions, one for each argument. The translation from standard notation to this appli- 


cative notation is often called Currying. For example, the term 


S(gla, b), h()) 


is Curried to 


AP(AP(f, AP(AP(g, a), b), AP(h, c))). 


Since the performance of the pattern-matching techniques used by the equation 
interpreter is affected by the internal representation of the patterns, it may be 
important to choose the best such representation in order to solve large problems. 
The current version of the system is not particularly sensitive to such choices, but 
earlier versions were, and later versions may again be so. In order to use an alter- 


nate internal representation, type 


loadsyntax Equnsdir Outersynt Innersynt 


where Outersynt is one of the syntaxes described in Sections 4.1-4.3, and Innersynt 
is the name of the chosen internal representation. Currently, only two internal 
representations are available. Standmath is the standard mathematical notation, 


so 


loadsyntax Equnsdir Outersynt Standmath 
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is equivalent to 


loadsyntax Equnsdir Outersynt 


The other internal representation is Curry, described above. 
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5. Restrictions on Equations 


In order for the reduction strategies used by the equation interpreter to be correct 
according to the logical-consequence semantics, some restrictions must be placed on 
the equations. The user may learn these restrictions by study, or by trial and error, 
since the preprocessor gives messages about each violation. Presently, 5 restrictions 
are enforced: 


‘1. No variable may be repeated on the left side of an equation. For instance, 
if yy =y 
is prohibited, because of the 2 instances of y on the left side. 


2. Every variable appearing on the right side of an equation must also appear on 
the left. For instance, f(x)=y is prohibited. 
3. Two different left sides may not match the same expression. So the pair of 


equations 


gx) =0;  glx,l) =0 


is prohibited, because both of them apply to g (0,1). 


4. When two (not necessarily different) left-hand sides match two different parts 
of the same expression, the two parts must not overlap. E.g., the pair of equa- 


tions 


first(pred(x)) = predfunc; pred(succ(x)) = x 


is prohibited, since the left-hand sides overlap in first (pred (succ (0)). 


5. It must be possible, in a left-to-right preorder traversal of any term, to iden- 
tify an instance of a left-hand side without traversing any part of the term 


below that instance. This property is called left-sequentiality. For example, 
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the pair of equations 
Se, a), y) = 0; gb, =1 


is prohibited, since after scanning f(g it is impossible to decide whether to 
look at the first argument to g in hopes of matching the b in the second equa- 


tion, or to skip it and try to match the first equation. 


Violations of left-sequentiality may often be avoided by reordering the argu- 
ments to a function. For example, the disallowed equations above could be 
replaced by f(g(a,x),y) = 0 and g(c,b) = 1. Left-sequentiality does not neces- 
sarily imply that leftmost-outermost evaluation will work. Rather, it means that in 
attempting to create a redex at some point in a term, the evaluator can determine 
whether or not to perform reductions within a leftward portion of the term without 
looking at anything to the right. Left-sequentiality is discussed in more detail in 


Sections 17 and 18.3. 


All five of these restrictions are enforced by the preprocessor. Violations pro- 
duce diagnostic messages and prevent compiling of an interpreter. The left- 
sequentiality restriction (5) subsumes the nonoverlapping restriction (4), but later 
versions of the system will remove the sequentiality constraint. Later versions will 
also relax restriction (3) to allow compatible left-hand sides when the right-hand 


sides agree. 


6. Predefined Classes of Symbols 

It is sometimes impossible to list in advance all of the symbols to be processed by a 
particular set of equations. Therefore, we allow 4 predefined classes of symbols to 
be invoked by name. These classes consist entirely of constants, that is, nullary 


symbols. 


- 6.1. integer_numerals 
The integer_numerals include all of the sequences of base-10 digits, optionally pre- 
ceded by "=". Numerals are limited to fit in a single machine word: the range 
-2147483647 to +2147483647 on the current VAX implementation. Later versions 
will use the operators of Section 14 to provide arbitrary precision integer arith- 


metic. 


6.2. truth_values 


The truth_values are the symbols true and false. They are included as a 


predefined class for standardization. 


6.3. characters 


The characters are ASCII characters, presented in single or double 
quotes. The only operations available are conversions between characters 
and integer numerals. Later versions will use the operators of Section 14 to pro- 
vide arbitrarily long character strings, and some useful string-manipulating opera- 


tions. 


6.4. atomic_symbols 


The atomic_symbols are structureless symbols whose only detectable relations are 


equality and inequality. Every identifier different from true and false, and not 
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having any arguments, is taken to be an atomic symbol. _In order to distinguish 
nullary literal symbols from atomic symbols, the literal symbols are given null 
strings of arguments, such as /it() (in Standmath notation) and /it{] (in LISP.M 
notation). Currently, atomic_symbols are limited to lengths from 0 to 20. Later 
versions will use the operators of Section 14 to provide arbitrarily long 


atomic_symbols. 


Section 7 describes predefined functions which operate on these classes of sym- 


bols. 


7. Predefined Classes of Equations 

The predefined classes of equations described in this section were introduced to 
provide access to selected machine instructions, particularly those for arithmetic 
operations, without sacrificing the semantic simplicity of the equation interpreter, 
and without introducing any new types of failure, such as arithmetic overflow. 
Only those operations that are extremely common and whose implementations in 
“machine instructions bring substantial performance benefits are included. The 
intent is to provide a minimal set of predefined operations from which more power- 
ful operations may be defined by explicitly-given equations. So, every predefined 
operation described below has the same effect as a certain impractically large set of 
equations, and the very desirable extensions of these sets of equations to handle 
multiple-word objects are left to be done by explicitly-given equations in later ver- 
sions. 

For each predefined class of symbols, there are predefined classes of equations 
defining standard functions for those symbols. Some of the functions produce 
values in another class than the the class of the arguments. Predefined classes of 
equations allow a user to specify a prohibitively large set of equations concisely, 
and allow the implementation to use special, more efficient techniques to process 
those equations than are used in general. When a predefined class of functions is 
invoked, all of the relevant function symbols and classes of symbols must be 
declared as well. We will describe the functions defined for each class of symbols. 
The associated class of equations is the complete graph of the function. For exam- 
ple, the integer function add has the class of equations containing add (0,0)=0, 


add (0,1)=1, ... add(1,0)=1, add (1,1)=2, ... . 
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7.1. Functions on atomic_symbols 


equatom equ(x,y) = true if x=y, 


false otherwise 


7.2. Integer Functions 


multint multiply(x,y) = x * y 

divint divide(x,y) = the greatest integer <x/y if y#*0 

modint modulo(x,y) = x — (ysdivide(x,y)) if y #0, 
x otherwise 

addint add(x,y) =x +y 

subint subtract(x,y) =x — y 

equint equ(x,y) = true if x=y, 


false otherwise 


lessint less(x,y) = true ifx<y, 


false otherwise 


An expression starting with the function divide will not be reduced at all if the 
second argument is 0. Thus, the output will give full information about the 
erroneous use of this function. Similarly, additions and multiplications leading to 
overflow will simply not be performed. Later versions will perform arbitrary preci- 


sion arithmetic (see Section 9.6), removing this restriction. 


7.3. Character Functions 


equchar equ(x,y) = true if x=y, 


false otherwise 
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intchar char(i) = the ith character in a standard ordering 


charint seqno(x) = the position of x in a standard ordering 


An application of char to an integer outside of the range 0 to 27 — 1 = 127, or an 
application of seqno to a string of length other than 1 will not be reduced. Later 
versions will use the operations of Section 14 to provide useful string-manipulating 


operations for arbitrarily long character strings, 


8. Syntactic Qualifications on Variables 


Even with a liberal set of predefined functions, there will arise cases where the set 
of equations that a user wants to include in his definition is much too large to ever 
type by hand. For example, in defining a LISP interpreter, it is important to define 
the function atom, which tests for atomic symbols. The natural set of equations to 
define this function includes atom (cons(x,y))=false, atom(a)=true, 
atom (b)=true, ... atom (aa)=true, atom (ab)=true, ..... We would like to abbre- 
viate this large set of equations with the following two: 

atom(cons(x,y)) = false; 

atom(x) = true where x is either 

in atomic_symbols 
or in integer_numerals 


end or 
end where 


Notice that the qualification placed on the variable x is essentially a syntactic, 


rather than a semantic, one. In general, we allow equations of the form: 


term = term where qualification end where 


A qualification is of the form 


WV 


qualification_item,, +++ qualification_item,, m 


and qualification_items are of the forms 


variable is qualification_term 


variable,, -++ variable, are qualification_term 


and qualification_terms are of the forms 


in predefined_symbol_class 
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term 
qualification_term where qualification end where 


either qualification_term, or -+~ qualification_term, end or 


Examples illustrating the forms above: 
atompair_or_atom(x) = true 
where x is either 
cons(y,z) where y,z are in atomic_symbols end where 
or in atomic_symbols 
end where; 
atom _int_pair(x) = true 
where x is cons(y,z) 
where y is in atomic_symbols, 
x is in integer_numerals 
end where 
end where 


If the same variable is mentioned in two different nested qualifications, the inner- 


most qualification applies. 


The interpretation of the restrictions on equations in Section 5 is not obvious 
in the presence of qualified equations. Restrictions 1 and 2, regarding the appear- 
ance of variables on left and right sides of equations, are applied to the unqualified 
equations, ignoring the qualifying clauses. Restrictions 4 and 5, regarding possible 
interactions between left sides, are applied to the results of substituting variable 
qualifications for the instances of variables that they qualify. For example, the 


equation 


S(x) = y where x is g(y) end where 


is prohibited, because the variable y is not present on the unqualified left side, and 


the pair of equations 
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SC) = 0 where x is g(y) end where; g(x) = 1; 


is prohibited because of the overlap in f(g(a)). In general, a variable occurring in 
a where clause is local to that clause, so g(x,y) =z where x is y is equivalent to 
g(x,y) =z, rather than g(x,x) =z. The details of interactions between variable 
bindings and where clauses certainly need more thought, but fortunately the subtle 


cases do not occur very often. 


9. Miscellaneous Examples 

This section contains examples of complete equational programs that do not fit any 
specific topic, but help give a general feeling for the capabilities of the interpreter. 
The first ones are primitive, and should be accessible to every reader, but later 
ones, such as the /ambda—calculus example, are intended only for the reader 


whose specialized interests agree with the topic. 


9.1. List Reversal 


The following example, using the LISP.M syntax, is chosen for its triviality. The 
operation of reversal (rev) is defined using the operation of adding an element to 
the end of a list (addend). A trace of this example shows that the number of steps 
to reverse a list of length n is proportional to 2. Notice that the usual LISP opera- 
tors car and cdr (first element, remaining elements of a list) are not needed, 
because of the ability to nest operation symbols on the left-hand sides of equations. 
This example has no advantage over the corresponding LISP program, other than 
transparency of notation. It is easy to imagine a compiler that would translate 
equational programs of this sort into LISP in a very straightforward way. 
Symbols 
: List constructors 

cons: 2; 

nil: 0; 
: Operators for list manipulation 

rev: 1; 

addend: 2; 

include atomic_symbols. 


For all x,y,z: 


rev[Q] = 0; 
revi(x . we = addendlrevlyJ; x]; 
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addend[Q; x] = (x); 
addend[{(x . y); z] = (x . addendly; z)). 


The following equations redefine list reversal in such a way that the equation 
interpreter will perform a linear-time algorithm. Just like the naive quadratic time 
version above, these equations may be compiled into a LISP program in a very 
straightforward way. 

Symbols 

cons: 2; 

nil: 0; 

rev: 1; 

apprev. 2; 

include atomic_symbols. 
For all x,y,z: 


revix] = apprevix; OJ; 


: apprevlx; z] is the result of appending z to the reversal of x. 
appreviQ); z] = 2; 
apprevi(x . y); z] = apprevly; (x . z)]. 


9.2. Huffman Codes 


The following definition of an operator producing Huffman codes [Hu52, AHU83] 
as binary trees is a little bit clumsier than the list reversals above to translate into 
LISP, since the operator Huff, a new constructor combining a partially-constructed 
Huffman tree with its weight, would either be omitted in a representing S- 
expression, or encoded as an atomic symbol. Either way, the list constructing 
operator is overloaded with two different intuitive meanings, and the expressions 


become a bit harder to read. 


32 9. Miscellaneous Examples 


The following equations produce Huffman codes in the form of binary trees 
constructed with cons. To produce the Huffman tree for the keys K,,Ko,°°: Ky 
_with weights w),W2,°°‘W, in decreasing numerical order, evaluate the term 
BuildHuff ((Hufflwy;K,]- >> Hufflw,:K, D1. 
Symbols 


: List construction operators 
cons: 2; 
nil: 0; 


+ Hufflw; t] represents the tree t (built from cons) having weight w. 
Huff: 2; 


: Tree building operators 
BuildHufg: 1; 
Insert: 2; 
Combine: 2; 


: Arithmetic and logical symbols and operators 
add: 2; 
less: 2; 
include truth_values; 


include integer_numerals; 
include atomic_symbols. 


For all weight!, weight2, treel, tree2, x, y, remainder, item: 


: if is the standard conditional function, and add, less 
2 are the standard arithmetic operation and test. 
ifltrue; x; y] = x; iflfalse; x; yl = y; 


include addint, lessint; 


: BuildHuffllist] assumes that its argument is a list of weighted trees, in 
: decreasing order by weight, and combines the trees into a single tree 
: representing the Huffman code for the given weights. 


BuildHuffl(Hufflweight!; treel])] = treel; 
BuildHuffl(x y . remainder)] = 


BuildHufflinsert[remainder; Combinelx; y]]] 
where x, y are Hufflweight!; tree1] end where; 
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: Insertllist; tree] inserts the given weighted tree into the given list of weighted 
trees according to its weight. Insert assumes that the list is in decreasing 
order by weight. 


Insert[0; item] = (item); 
ders) He ives abies tree2] . remainder); Hufflweight1; tree1]] = 
iflless pian ; weight2]; 


(Hufflweight1; treel] Hufflweight2; tree2] . remainder); 
(Hufflweight2; tree2] . Insertlremainder; Huff[weight1; tree1]))]; 


: Combinelt1; t2] is the combination of the weighted trees t1 and t2 resulting 
: from hanging t1 and t2 from a common root, and adding their weights. 


Combines affivciencts treel]: Hufflweight2; tree2]] = 
Huffladdlweight1; weight2J; (tree! . tree2)]. 


9.3. Quicksort 


The following equational program sorts a list of integers by the Quicksort pro- 
cedure [Ho62, AHU74]. This program may also be translated easily into LISP. 
The fundamental idea behind Quicksort is that we may sort a list / by choosing a 
value i (usually the first value in /) splitting / into the lists / <, /., and /, of ele- 
ments <i, =i, and >i, respectively. Then, sort /< and /, (J. is already sorted), 
and append them to get the sorted version of /. Quicksort sorts a list of n elements 
in time O(nlogn) on the average. 
Symbols 
: List construction operators 

cons: 2; 

nil: 0; 
: List manipulation operators 

smaller, larger: 2; 

append: 2; 


sort: 1; 


2 ea and arithmetic operators and symbols 
if: 3; 
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less: 2; 
include integer_numerals, truth_values. 


For all i, j, a, b, rem: 
sort] = 0; 
sortl(i . rem)] = appendlsort[smallerli; reml]; append[(i); sortllargerli; rem]]]]: 


z smaller[i; al is a list of the elements of a smaller than or equal to the integer i. 


“smallerli; 0] = 0; 
smallerli; (j . rem)] = ifllessli; jJ; smallerli; rem]; (j . smaller[i; rem])]: 


: largerli; al is a list of the elements of a larger than the integer i. 
largerli; 0] = 0; 
largerli; (j . rem)] = ifllessli; jl; G . largerli; rem); largerli; reml]- 


: appendla; b] is the concatenation of the lists a and b. 
append[(); a] = a; 
appendl(i . rem); a] = (i . appendlrem; al); 


‘if, less, and greater are the standard logical and arithmetic operations. 
ifltrue; a; b] = a; iflfalse; a; b] = b; 


include lessint. 


9.4. A Toy Theorem Prover 


The fact that sorting a list / with an equational program is equivalent to proving 
sort{I] =1' based on certain assumptions, leads one to consider the similarity 
between sorting by exchanges, and proving equalities based on commutativity and 


associativity. Given the axioms x+y =y+x and (x+y)+z =x+(y+z), we 
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quickly learn to recognize that for any two additive expressions, Z, and E>, con- 
taining the same summands possibly in a different order, EF; = E,. One way to 
formalize that insight is to give a procedure that takes such E£, and E, as inputs, 
checks whether they are indeed equal up to commutativity and associativity, and 
produces the proof of equality if there is one. The proof of equality of two terms 
by commutativity and associativity amounts to a sorting of the summands in one or 
both terms, with each application of the commutativity axiom corresponding to an 


interchange. 


In the following program, comparel[a;b] takes two additive expressions a and 
b, and produces a proof that a=b from commutativity and associativity, if such ; 
proof exists. Additive expressions are constructed from numbered variables, with 
vli] representing the ith variable v;, and the syntactic binary operator plus. 
Proofs are represented by lists of expressions, with the convention that each expres- 
sion in the list transforms into the next one by one application of commutativity or 
associativity to some subexpression. Since equality is symmetric, proofs are correct 
whether they are read forwards or backwards. The proof of a=b starts by proving 
a=a', where a' has the same variables as a, combined in the standard form 
vj, +0,,+ +++, _ +4) ++), with i;<ij4;. This proof of a=a' is the value of 
stand[a]. A similar proof of b=b' is produced by stand[b]. Finally, stand{a] is 
concatenated with the reversal of stand[b]. If the standard forms of a and b are 
not syntactically identical, then there is a false step in the middle of the con- 
catenated proofs, and that step is marked with the special operator falsestep[]. 

The interesting part of the procedure above is the stand operation, proving 
equality of a term with its standard form. That proof works recursively on an 


expression a+b, first standardizing a and b individually, then applying the 
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following transformations to combine the standardized a and 6 into a single stan- 
dard form. 


1. (v+a)+(vj;+5) associates to vt (a+(v,+b)) when i<j 

2. (v,ta)+(vj+b) commutes to (vj+b)+(v,+a) associates to v,+(b+(v;+a)) 
when i>j 

3. (vta)+v; associates to v,+(atv,) when i <j 

4, (v,ta)+v; commutes to vj;+(v,+a) when i>j 


5. vj+(vj,+b) associates to (v,+v,)+b 


commutes to (v;+v;)+b associates to vj+(v;+b) when i>j 
6. v,+v; commutes to v;+v, when i>j 


In cases 1-3 above, more transformations must be applied to subexpressions. In the 
following program, the merge operator performs the transformations described 
above. 

Symbols 


: Constructors for lists 
cons: 2; 
nil: 0; 
include integer_numerals; 


: Constructors for additive expressions 
plus: 2; 
we I; 


: Errors in proofs 
falsestep: 0; 


/ Operators for testing and proving equality under commutativity, associativity 
compare: 2; 
stand: 1; 
merge. 1; 
plusp: 2; 
appendp: 2; 


: Standard list and arithmetic operators 
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addend. 2; 


equ: 2; 
include truth_values. 


For all a, b, c, d, i, j, rem, reml, rem2: 


comparela; b] = appendplstandla]J; standlbJ]; 


standivii]] = (Vii); 


stand[plusla; b]] = mergelplusplstand[a]; stand[blll; 


mergel(a b . rem)] = (a. mergel(b . rem))); 
mergel(a)] = mergelal; 


: Case 6 
mergelpluslviiJ; vii] = 
iflless[j; il; 
commute vi and vj 


(pluslvii]; vGll_ plusivijl; vil; 


else 


no change 
(pluslviil; vi{IDI; 
| : Case 5 
| eee aes pluslv{jJ; bil] = 
| 
iflless[j; il, 


Colushilll pluslv{jj; bi] 
| ¥ associate vi with vj 
plus[pluslviiJ; v[jl]; b] 

commute vi and vj 
pluslpluslvGj]; vill; 6] . 

associate vi with b 


plusp[{WGjD; mergelpluslviil: bill); 


else 


10 change 
plusp[WhD); visvaeltichelt blll 
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mergelplus{pluslviil; al; viii] = 
iflless[j, il; 


se 4 
(plusiplus(vliJ; al: vii] . ; 
commute vita and vj 


pluspl@[j); mergelpluslviil; all); 
ee 
Ci 
°* (pluslplustotil: al: vil. 


associate a with vj 


plusp{(Vli); mergelplusta; vG]I)I; 


mergelplus[pluslvlil; al; plusivijJ; b]i] = 
iflless[j; il; 


2 
si (plus[plus{viil; al; plusivij]; bl] 


commute vita and vj+b 
plus[pluslvijJ; b]; pluslviil; all . 


associate b with vita 
plusp[{VGjD; mergelplus[b; plusivli]; alll); 


else 


I 
7 olusliaiivabplacbinbi 


associate a with vj+b 
plusp{(viil); mergelplusla; pluslvijJ; b1IIDI: 


: plusplp; ql transforms proofs p of E1=F1 and q of E2=F2 into a proof 
: of plus[E1; E2] = plus[F1; F2]. 
pluspl(a); rem] = pluspla; rem]: 


pluspl(a b . rem1); (c . rem2)] = (plusla; c] . pluspl(b . rem1); (c . rem2))); 
pluspla; 0] = 0 


where a is either vli] or pluslb; c] end or end where; 


pluspla; (b . rem)] = (plusta; b] . pluspla; rem) 
where a is either lil or pluslb; c] end or end where; 


: appendplp; q] appends proofs p and the reversal of q, coalescing the last lines 
/ Uf they are the same, and indicating an error otherwise. 
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appendplI(a); (b)] = iflequla; b]; (a); (a falsestepf] b)J: 
appendpl(a b . rem); c] = (a. appendpl(b . rem); cl); 
appendpl(a); (b c . rem)] = addendlappendpl(a); (c . rem)]; bl: 


: addendlfl; al adds the element a to the end of the list I. 
addend[Q); a] = (a); 
addend[(a . rem); b] = (a. addendlrem; bl); 


: equ is extended to additive terms, as a test for syntactic equality. 
equlviil; vj] = equli; ji; 
equlplusla; bl]; pluslc; dl] = andlequta; cl; equlb; dll; 
equlvii]; pluslc; dj] = false; 
equlplusta; b]; v[j]] = false; 


include equint; 


: if, and, less are standard operators. 
ifltrue; a; b] = a; iflfalse; a; b] = b; 
andla; b] = ifla; b; false]; 


include lessint. 


9.5. An Unusual Adder 


The following example gives a rather obtuse way to add two numbers. The intent 
of the example is to demonstrate a programming technique supported by the equa- 
tion interpreter, but not by LISP, involving the definition of infinite structures. We 
hope that this silly example will clarify the technique, while more substantial 


examples in Sections 9.7, 15.3 and 15.4 will show its value in solving problems 
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elegantly. The addition program below uses an infinite list of infinite lists, in which 
the ith member of the jth list is the integer i+j. In order to add two nonnegative 
integers, we select the answer out of this infinite addition table. The outermost 
evaluation strategy guarantees that only a finite portion of the table will actually be 
produced, so that the computation will terminate. 

Symbols 

: List constructors. 


nil: 0; 

include integer_numerals; 
: List utilities. 

element: 2; 

first; 1; 

tail: 1; 

inclist: 1; 


: Standard arithmetic operators. 
add, subtract, equ: 2; 


if- 3; 
include truth_values; 
: Unusual integer list, addition table and operator. 
intlist: 0; 
addtable: 0; 
weirdadd: 2. 
For all 
firstl . DJ = x; 
taill( . D] = 1; 
elementli; I] = ifle ae ol; 


firstll] 
OS ieua: UJ; tail fill; 


weirdaddli; j] = elementli; element[j; addtable[]II; 


addtablef] = (intlistf] . inclistladdtablel]]); 
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intlist] = (0. inclistlintlist QD; 


inclistli] = addli; 1] where i is in integer_numerals end where; 


inclist{(G.. DJ = (inclistli] . inclist{D; 


include addint, subint, equint. 


9.6. Arbitrary-Precision Integer Operations 


The equation interpreter implementation provides the usual integer operations as 
primitives, when these operations are applied to integers that may be represented in 
single precision, and when the result of the operation is also single precision. In 
order to provide arbitrary precision integer operations, we extend these primitive 


sets of equations with some additional explicit equations. 


The following equations define arbitrary-precision arithmetic on positive 
integers in a straightforward way. A large base is chosen, for example base 
2!5=32768, and the constructor extend is used to represent large numbers, with the 
understanding that extend(x,i) represents x*base+i, for 0<i<base. longadd 
and /ongmult are the binary operators for addition and multiplication. Addition 
follows the grade school algorithm of adding digits from right to left, keeping track 
of a carry. Multiplication also follows the usual algorithm for hand calculation, 
adding up partial products produced by multiplying one digit of the second multi- 
plicand with the entire first multiplicand. 

Symbols 
: Constructors for arbitrary-precision integers. 
extend: 2; 


include integer_numerals; 


: Base for arithmetic. 
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base: 0; 


: Single precision arithmetic operators. 
add: 2; 
multiply: 2; 

: Arbitrary-precision arithmetic operators. 
longadd: 2; 
longmult: 2; 


: Operators used in defining the arithmetic operators. 
sum: 3; 

acarry: 3; 

carryadd: 3; 

prod: 3; 

mearry: 3; 

carrymult: 3. 


For all x, y, 2, i,j, &: 
base() = 32768; 


longadd(x, y) = carryadd(x, y, 0); 


carryadd(i, j, k) = if(equ(acarry(i, j, k), 0), 
sum(i, j, k), 
extend(acarry(i, j, k), sum(i, j, k))) 
where i, j are in integer_numerals end where; 


carryadd(extend(x, i), j, k) = carryadd(extend(x, i), extend(0, j), k) 
where j is in integer_numerals end where; 


carryadd(i, extend(y, j), k) = carryadd(extend(0, i), extend(y, j), k) 
where i is in integer_numerals end where; 


carryadd(extend(x, i), extend(y, j), k) = 
extend (longadd(x, y, acarry(i, j, k)), sum(, j, kJ); 
sum(i, j, k) = mod(add(i, add@, k)), baseQ); 
acarry(i, j, k) = div(add(i, add(j, k)), baseQ); 
longmult (x, j) = carrymult(x, j, 0) 
where j is in integer_numerals end where; 


longmult(x, extend(y, j)) = 
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longadd(carrymult(x, j, 0), extend(longmult(x, y), 0)); 


carrymult(i, j, k) = if(equ(mcarry(i, j, k), 0), 
prod (i, é: iD 
extend(mcarry(i, j, k), prod(i, j, k)) 
where i is in integer_numerals end where; 
carrymult(extend(x, i), j, k) = 
extend (carrymult(x, j, mcarry(i, j, k)), prod(i, j, k)); 
prodi(i, j, k) = mod(multiply(i, multiplyG, k)), baseQ); 


mearry(i, j, k) = div(multiply(i, multiplyG, k)), baseQ); 


include addint, divint, modint, multint. 


The simple equational program above has several objectionable qualities. 
First, in order to make the operations sum, acarry, prod, and mcarry really work, 
we must choose a base much smaller than the largest integer representable in sin- 
gle precision. In particular, to allow evaluation of prod and mcarry in all cases, 
the base used by the program may not exceed the square root of full single preci- 
sion. This use of a small base doubles the sizes of multiple-precision integer 
representations. By redefining the offending operators, we may allow full use of 
single precision, but only at the cost of a substantial additional time overhead. For 


example, sum would have to be defined as 


sum(i, j, k) = addmod(i, addmod(j, k, baseQ), baseQ); 
addmod (i, j, k) = if(less(i, subtracttk, j)), 


add(i, j), 
add (subtract (maxi, j), k), min(@i, j))); 


and prod would be even more complex, because of the possibility of zeroes. Even if 


we accept the reduced base for arithmetic, extra time is required to add or multiply 
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two single-precision numbers with a single-precision result, since we must check for 
the nonexistent carry. Finally, it is distasteful to introduce new operators longadd 


and longmult when our intuitive idea is to extend add and multiply. 


In order to avoid these objections, we need a slightly better support from the 
predefined operations add and multiply. When faced with an expression 
add (a,8), where a and 8 are single-precision numerals, but their sum requires dou- 
ble precision, the current version of the equation interpreter merely declines to 
reduce. In order to support multiple-precision arithmetic cleanly and efficiently, 
the implementation must be modified so that add and multiply produce results 
whenever their arguments are single-precision numerals, even if the results require 
double precision. The double precision results will be represented by use of the 
extend operator in the program above. The only technical problem to be solved in 
providing this greater support is the syntactic one: what status does the extend 
operator have - need it be declared by the user? This syntactic problem is a spe- 
cial case of a very general need for modular constructs, including facilities to com- 
bine sets of equations and to hide certain internally meaningful symbols from the 
user. Rather than solve the special case, we have postponed this important 


improvement until the general problem of modularity is solved (see Section 14). 


Given an appropriate improvement to the predefined arithmetic operations, 
arbitrary precision may be provided by the following equational program. lowdigit 
picks off the lowest order digit of an extended-precision numeral, highdigits pro- 
duces all but the lowest order digit. 

Symbols 
: Constructors for arbitrary-precision integers. 


extend: 2; 
include integer_numerals; 
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: Arithmetic operators. 
add: 2; 
multiply: 2; 
: Operators used in defining the arithmetic operators. 
lowdigit: 1; 
highdigits: 1. 


For all x, y, z, i, j, k: 


add (extend (x, i), extend(y, j)) = add(extend(add(x, y), 0), add(i, j)); 


add(extend(x, i), j) = extend(add(x, highdigits(add(i, j))), lowdigit(add(i, j))) 
where j is in integer_numerals end where; 


add(i, extend(y, j)) = extend(add(y, highdigits(add(i, j))), lowdigit(add(i, j))) 
where i is in integer_numerals end where; 
lowdigit(extend(x, i) = i; 


highdigits(extend(x, i) = x; 


multiply (x, extend(y, j)) = add(multiply(x, j), extend(multiply(x, y), 0)); 
multiply (extend(x, i), j) = add(multiply(i, j), extend(multiply(x, j), 0)) 
where j is in integer numerals end where; 


include addint, multint. 


This improved equational program answers all of the objections to the first one, and 
is substantially simpler. Notice that, whenever an operation is applied to single- 
precision arguments, yielding a single-precision result, only the predefined equa- 
tions for the operation are applied, so there is no additional time overhead for those 
operations. Negative integers may be handled through a negate operator, or by 
negating every digit in the representation of a negative number. The second solu- 
tion wastes one bit in each digit, but is more compact for single-precision negative 


numbers, and avoids additional time overhead for operations on single-precision 
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negative numbers. 


9.7. Exact Addition of Real Numbers 


Another interesting example of programming with infinite lists involves exact com- 
putations on the constructive real numbers [Bi67, Br79, My72]. In principle, a 
constructive real number is a program enumerating a sequence of rational intervals 
converging to a real number. Explicit use of these intervals is quite clumsy com- 
pared to the more intuitive representation of reals as infinite decimals. Unfor- 
tunately, addition is not computable over infinite decimals. Suppose we try to add 
0.99:++ to 1.00---. No matter how many digits of the two summands we have 
seen, we cannot decide for sure whether the sum should be 1.--+ or 2.---. If 
he sequence of 9s in the first summand ever drops to a lower digit, then the sum 
must be of the form 1.--- ; if the sequence of Os in the second summand ever rises 
to a higher digit, then the sum must be of the form 2.-+-. As long as the 9s and 
Qs continue, we cannot reliably produce the first digit of the sum. Ironically, in 
exactly the case where we can never decide whether to use 1.--- or 2.---, either 
one would be right, since 1.99--- = 2.00---. One aspect of the problem is that 
conventional infinite decimal notation allows multiple representations of certain 
numbers, such as 1.99--+ =2.00---, but requires a unique representation of 
others, such as 0.11---. The solution is to generalize the notation so that every 
number has multiple representations, by allowing individual digits to be negative as 
well as positive. This idea was proposed for bit-serial operations on varying- 


precision real numbers [Av61, At75, O179]. 


Let the infinite list of integers (dy d, d,---), be used to represent the real 


number S'd;*107'. do is the integer part, and there is an implicit decimal point 
i=0 
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between dg and d;. In conventional base 10 notation, each d; for i21 would be 
limited to the range [0,9]. Suppose that the range [—9,+9] is used instead 
([-5,+5] suffices, in fact, but leads to a clumsier program). As a result, every real 
number has more than one representation. In particular, the intervals correspond- 
ing to finite decimals overlap, so that every real fumiber is in the interior of arbi- 
trarily small intervals. For a conventional finite decimal a, let Ip.(a) denote the 
interval of real numbers having conventional representations beginning with a. 
Similarly I_99(a) denotes the interval of real numbers having extended representa- 
tions beginning with a, where a is a finite decimal with digits from —9 to 9. 

The problem with the conventional notation is that certain real numbers do 
not lie in the interiors of any small intervals Ij9(a), but only on the endpoints. 
When generating the decimal expansion of a real number x, it is not safe to specify 
an interval with x at an endpoint, since an arbitrarily small correction to x may 
take us out of that interval. For example, 1.1 is only an endpoint of the intervals 
Ing (1-1) = (1.1, 1.2], Ipg(1-10) = (1.1, 1.11], Ip9(1-100) = [1, 1.101], etc., and the 
smallest interval Io.9(a) with 1 in its interior is Ip.9(1) = [1,2]. On the other hand, 
1.1 is in the interior of each interval I_99(1.1) = [1, 1.2], Lo 9(1.10) = [1.09, 1.11], 
T.99(1.100) = [1.009, 1.101], etc., because the larger number of digits stretches 
these intervals to twice the width of the conventional ones, yielding enough overlaps 
of intervals to avoid the singularities of the conventional notation. 

The notation described above is a fixed point notation. Infinite floating point 
decimals may also be defined, allowing dp to be restricted to the range [—9,+9] as 
well. Such an extension of the notation makes the programs for arithmetic opera- 
tions more complex, but does not introduce any essential new ideas. Conventional 


computer arithmetic on real numbers truncates the infinite representation of a real 
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to some finite precision. The equation interpreter is capable of handling infinite 


lists, so, except for the final output, it may manipulate exact real numbers. 


In order to program addition of infinite-precision real numbers, as described 
above, we mimic a program in which the two input numbers are presented to a pro- 
cess called addlist, which merely adds corresponding elements in the input lists to 

_ produce the output list. Notice that the output from addlist represents the real 
sum of the two inputs, but has digits in the range [-18,+18]. The output from 
addlist goes to a process called compress, which restores the list elements to the 
range —9 to +9. The output from compress is the desired result. The function 
add is defined by the composition of addlist and compress. Notice that, while 
addlist produces one output digit for every input pair, compress must see more 
than one input digit in order to produce a single output digit. Looking at d; and 
d;4;, where d; has already been compressed to the range [—8,+8], compress 
adjusts d;,, into the range [-8,+8] by adding or subtracting 10, if necessary, and 
compensating by adjusting d,; by +1. Notice that it is important to first place a 
digit in [—8,+8], so that it may be used in the adjustment of the next digit and 
stay in [—9,+9]. 

In order to use the addition procedure, we need to provide some interesting 
definitions of infinite-precision real numbers, and also a function called standard to 
produce the output in conventional base-10 with finite precision. standard takes 
two arguments: a single integer i, and an infinite-precision real r. The result is the 
standard base-10 representation of r to i significant digits. Notice that standard 


may require i+1 digits of input in order to produce the first digit of output. 
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Symbols 


: List constructors 
cons: 2; 
nil; 0; 


: List manipulation operators 
first: 1; 
tail: 1; 


: Real arithmetic operator 
add: 2; 


: Other operators needed in the definitions 
addlist: 2; 
compress: 1; 
stneg, stpos: 3; 
revneg, revpos: 2; 
rotate: I; 


: Input and output operators 
repeat: 2; 
standard: 2; 
: Standard arithmetic and logical operators and symbols. 
ift 3; 
equ, less, greater: 2; 
include integer_numerals, truth_values. 


For all x, y, i, j, k, l, a, 6: 
: add is extended to infinite-precision real numbers. 


addl(i . x); G . y)] = compressladdlistlG . x); G . y)II; 


: addlist adds corresponding elements of infinite lists. 


addlistl(i . x); Gj . yj] = (addli; j] . addlistlx; y)); 


: compress normalizes an infinite list of digits in the range [-18, +18] 


: into the range [-9, +9]. 


compressl(i j . x)] = 


iflless[j; -8]; (subtractli; 1] . compress[(addlj; 10] . x))); 
iflgreater[j; 8]; (addli; 1] . compressl(subtractlj; 10] . x); 


(i. compresslG . x) DI; 
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: repeatlx; y] is the infinite extension of the decimal expansion x 
: by repeating the finite sequence y. 


repeatl(i . x); y] = (i. repeatlx; yl); 
repeatl(); y] = (firstly] . repeat[0); rotately])); 


: rotately] rotates the first element of the list y to the end. 
. rotatel(i)] = (i; 
rotatel(i j . x] = (j. rotatel(i . x))); 


2 standardli; a] is the normal base 10 expansion of the first i digits of a. 


standard[j; (i. a)] = iflequlj; 0]; 0; 
iflequli: OJ]; (0. standard{subtractlj; 1]: al); 
ifllessli; 0]; stneglj; G.. a); Ol; 
stposfj; (i. a; OI: 


: stnegli; a; b] translates the first i digits of a into normal base 10 notation, 
backwards, and appends b, assuming that a is part of a negative number. 
: stposli; a; b] does the same thing for positive numbers. 


stneglj; (i. a); b] = iflequlj; Ol: 

revneglb; QJ; 
Hnesubiractih LU; a; @. bd: 
stposlj; (i. a); b] = iflequlj; 0]: 


revposlb; QJ; 
stposlsubtractlj; 1]; a; (i. bd]; 


: revnegla; b] reverses the finite decimal expansion a, borrowing and carrying so as 
to make each digit nonpositive, finally appending the list b. 
: revposla; b] does the same, making each digit nonnegative. 


revnegl(; b] = b; 
revnegl(i . a); b] = ifflessli; 1]: 
revnegla; (i. b)J: 
revnegl(add[firstla]; 1] . taillal); 
(addli; -10] . b)II; 


revposlQ); b] = b; 
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revposl(i . a); b] = ifllessli; 0]; 
revposl(add[firstlal; -1] . taillal); 
(addli; 10] . b)I; 
revposla; (i . bJII; 


: first, tail, if, add, subtract, equ, less, are standard list, 
: conditional, and arithmetic operators. 

first. a] = i; taill(i . a] = a; 

ifltrue; x;y] =x; — iflfalse; x; y] = y; 


greaterli; j] = less[subtract[0; i]; subtractl0; j]] 
where i, j are in integer_numerals end where; 


include addint, subint, equint, lessint. 


In this example, producing output in standard form was much more difficult than 
performing the addition. Other arithmetic operations, however, such as multiplica- 


tion, are much more difficult to program. 


9.8. Polynomial Addition 


Polynomial addition is a commonplace program in LISP, with polynomials 
represented by lists of coefficients. The equation interpreter allows polynomial 
sums to be computed in the same notation that we normally use to write polynomi- 
als, with no distinction between the operator add that applies to integer numerals, 
the operator add that applies to polynomials, and the operator add used to con- 
struct polynomials. In LISP, the first would be PLUS, the second a user defined 
function, perhaps called POLYPLUS and the third would be encoded by a particu- 


lar use of cons. 


In effect, the equational programs shown below for "adding polynomials" are 
really just simplifying polynomial expressions into a natural canonical form. The 


Horner—rule form for a polynomial of degree n in the variable X is 
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CotXe(cy+Xe +++ +X4e,°°*). The list (cgc,***c,), typically used to 
represent the same polynomial in LISP, is a very simple encoding of the Horner- 
rule form. In the equation interpreter, we may use the Horner-rule form literally. 
The resulting program simplifies terms of the form 7,+7 , where each of a, and 7 
is in Horner-rule form, to an equivalent Horner-rule form. Notice that the symbol 
add in the following programs may be an active symbol or a static constructor for 
polynomials, depending on context. Also, notice that the variable X over which the 
polynomials are expressed is not a variable with respect to the equational program, 
but an atomic_symbol. 
Symbols 
add: 2; 
multiply: 2; 
include integer_numerals, atomic_symbols. 
For all i, j, a, b: 
add(add(i, multiply(X, a)), add(j, pee b)) = 
add(add(i, j), multiply(X, add(a, b))) 
where i, j are in integer_numerals end where; 
add(i, add(j, multiply(X, b))) = 
add(add(i, J), multiply (X, b)) 
where i, j are in integer_numerals end where; 
add(add(i, multiply (X, a 
add(add(i, j), multiply (X, a) 


where i, j are in integer_numerals end where; 


include addint. 


The program above is satisfyingly intuitive, but does not remove high-order 0 
coefficients. Thus, (1+X* (2+X*3))+(1+X«(2+X*—3)) reduces to 2+X*(4+X*0) 
instead of the more helpful 2+X+4. Getting rid of the high-order zeroes is tricky, 


since the natural equations X+0 = 0 and a+0 =a suffer from overlaps with the 
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other equations. One solution, show below, is to check for zeroes before construct- 
ing a Horner-rule form, rather than eliminating them afterwards. 
Symbols 

add: 2; 

multiply. 2; 


ift 3; 

equ: 2; 

and: 2; 

include integer_numerals, atomic_symbols, truth_values. 


For all i, j, a, b,c, a: 


add(add(i, a), add(j, b)) = add(add(i, j), add(a, b)) 
where i, j are in integer_numerals, 
a, b are multiply(c, d) 
end where; 


add(i, add(j, b)) = add(addi, j), b) 
where i, j are in integer_numerals, 
b is multiply(c, 2) 
end wiere; 


add(add(i, a), j) = add(addii, j), a) 
where i, j are in integer_numerals, 
a is multiply(c, d) 
end where; 


add(multiply(X, a), multiply(X, b)) = 
if(equ(add(a,b), 0), 0, multiply (X, add(a, b))); 


equ(add(i, multiply(X, a)), addG, multiply(X, b))) = 
and(equ(i, j), equ(a, b)) 
where i, j are in integer_numerals end where; 


equ(i, add(j, multiply(X, b))) = false 
where i, j are in integer_numerals end where; 


equ(add(i, multiply (X, a)), j) = false 
where i, j are in integer_numerals end where, 


if(true, a, b) = a; if(false, a, b) = b; 
and(a, b) = if(a, 6, false); 


include addint, equint. 
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It is amusing to consider other natural forms of polynomials, such as the 
power—series form, cgtX°+c +X 1+.-+++4¢,. This corresponds to the representa- 
tion of polynomials as lists of exponent-coefficient pairs. For dense polynomials, 
the exponents waste space, but for sparse polynomials the omission of internal 
zeroes may make up for the inclusion of exponents: as in 1+X!, A nice equa- 
tional programming challenge is to produce an elegant program for addition of 


polynomials in power-series form. 


9,9. The Combinator Calculus 


Weak reduction in the combinator calculus [CF58, St72] is a natural sort of com- 
putation to describe with equations. The following equations use the Lambda syn- 


tax of Section 4.3 to allow the abbreviation 


(a, ay *** a,) 


for the expression 


AP(AP( +++ AP(a, a4), °** ), ay) 


The symbol Lambda from Section 4.3 is not used in the combinator calculus. 
Symbols 

AP: 2; 

S, K, I: 0; 

include atomic_symbols. 
For all x, y, z: 

(Sxy2 = (xz yz); 

(Kx y) = x; 
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(ix) =x. 


This example, and the polynomial addition of Section 9.8, differ from the first 
ones in that the only symbol that can construct complex expressions, AP, appears 
(implicitly) at the head of left-hand sides of equations. In many interesting sys- 
tems of terms, there are one or more symbols that do not appear at the heads of 
left-hand sides, so that they may be used to construct structures that are stable 
with respect to reduction. These stable structures may be analyzed and rearranged 
by other operators. For example, in LISP, the symbol cons is a constructor, and 
an expression made up only of cons and atomic symbols (i.e., an S-expression), is 
always in normal form. It is helpful to notice the existence of constructors when 
they occur, but the example above illustrates the usefulness of allowing systems 


without constructors. The use of constructors is discussed further in Section 12.1. 


9.10. Beta Reduction in the Lambda Calculus 
This example should only be read by a user with previous knowledge of the lambda 
calculus. The reader needs to read de Bruijn’s article [deB72] in order to under- 
stand the treatment of variables. The object is to reduce an arbitrary lambda term 
to normal form by a sequence of 6-reductions. A number of rather tricky prob- 
lems are encountered, but some of the usual problems encountered by other imple- 
mentations of 6-reduction are avoided by use of the equation interpreter. This 
example uses the Lambda notation of Section 4.3. 

The lambda calculus of Church [Ch41] presents several sticky problems for 
the design of an evaluator. First, the problem of capture of bound variables 


appears to require the use of the a-rule 
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\xE — \y.Ely/x] 


to change bound variables. There is no simple way to generate the new variables 
needed for application of the a-rule in an equational program. In more conven- 
tional languages, variable generation is simple, but its presence clutters up the pro- 


gram, and causes the outputs to be hard to read. 


De Bruijn [deB72] gives a notation for lambda terms in which an occurrence 
of a bound variable is represented by the number of lambda bindings appearing 
between it and its binding occurrence. This notation allows a simple and elegant 
solution of the technical problem of capture, but provides an even less readable out- 
put. We represent each bound variable by a term var[x,i], where x is the name of 
the variable and i is the de Bruijn number. The first set of equations below 
translates a lambda term in traditional notation into this modified de Bruijn nota- 
tion. De Bruijn notation normally omits the name of the bound variable immedi- 
ately after an occurrence of lambda, but we retain the name of the variable for 
readability. We write the de Bruijn form of a lambda binding as \:x.E (the x 
appears directly as an argument to the lambda) to distinguish from the traditional 
notation \x.E (in which the first argument to lambda is the singleton list contain- 
ing x). 

Symbols 
: Operators constructing lambda expressions 
Lambda: 2; 
AP: 2; 
: Constructors for lists 
cons: 2; 
nil: 0; 


: varlx, i] represents the variable with de Bruijn number i, named x. 
var: 2; 


9.10. Beta Reduction 57 


: bindvar carries binding instances of variables to the corresponding bound 
: instances, computing de Bruijn numbers on the way. 

bindvar: 3; 
: Arithmetical and logical operators and symbols 

ift 3; 

equ: 2; 

add: 2; 

include atomic_symbols, truth_values, integer_numerals. 


For all x, y, E, F, i, j: 


: Multiple-argument lambda bindings are broken into sequences of lambdas. 


\x yerem.E = \x.\y:rem.E; 


: Single-argument lambda bindings are encoded in de Bruijn notation. 
\x.E = \:x.bindvarlE, x, OJ; 
: bindvar[E, x, i] attaches de Bruijn numbers to all free instances of the variable 
x in the lambda-term E, assuming that E is embedded in exactly i 


lambda bindings within the binding instance of x. 


bindvarlx, y, i] = iflequlx, yJ, varlx, i], x] 
where x is in atomic_symbols end where; 


bindvarlvarlx, jl, y, i] = varlx, jl: 
bindvar{(E F), y, i] = (bindvarlE, y, i] bindvarlF, y, iD; 
bindvark;x.E, y, i] = \:x.bindvarlE, y, addli, 1]] 


where x is in atomic_symbols end where; 


‘if is the standard conditional function, equ the standard equality test, and add 
: the standard addition operator on integers. 


ifltrue, E, F] = E; iflfalse, E, F] = F: 


include equatom, addint, equint. 


In order to perform evaluation of a lambda term in de Bruijn notation, the 


transformation described above must be done logically prior to the actual - 
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reduction steps. In principle, equations for de Bruijn notation and 6-reduction 
could be combined into one specification, but it seems to be rather difficult to avoid 
overlapping left-hand sides in such a combined specification (see Section 5, restric- 
tion 4). At any rate, it makes logical sense to think of the transformation to 
de Bruijn notation as a syntactic preprocessing step, rather than part of the seman- 
tics of B-reduction. Therefore, we built a custom syntactic preprocessor for the B- 


reduction equations. After executing 


loadsyntax Lambda 


int.in (the syntactic preprocessor for the command ei) is the shell script: 


#! binlsh 

SYSTEM/Syntax/Outersynt/Lambdalint.in SYSTEM | 
SYSTEM{Syntax/Commoniint.in.trans SYSTEM | 
SYSTEM{Syntax/Commoniint.infin SYSTEM ; 


where SYSTEM is the directory containing equational interpreter system libraries, 


differing in different installations. we edited int.in to look like 


#! finlsh 

SYSTEM(Syntax/Outersynt/Lambdalint.in SYSTEM | 
SYSTEM|Syntax/Commoniint.in.trans SYSTEM | 
SYSTEM{Syntax/Commoniint.in.fin SYSTEM | 
DEBRUIJN&nterpreter; 


where DEBRUIJN is the directory in which we constructed the transformation to 


de Bruijn notation. We did not change pre.in (the syntactic preprocessor for ep) 
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or int.out (the pretty-printer for ei). 
Even with the elegant de Bruijn notation, two technical problems remain. 


First, the official definition of B-reduction: 


(\x.E F) — E[F/x] 


cannot be written as a single equation, since the equation interpreter has no nota- 
tion for syntactic substitution (see [K180a] for a theoretical discussion of term 
rewriting systems with substitution). The obvious solution to this problem is to 
introduce a symbol for substitution, and define its operation recursively with equa- 
tions. A nice version of this solution is given by Staples [St79], along with a proof 
that leftmost-outermost evaluation is optimal for his rules. For notational econ- 
omy, we take advantage of the fact that the lambda term \x.E may be used to 
represent syntactic substitution, so that no explicit symbol for substitution is 
required. Combining this observation with Staples’ rules, we produced the follow- 


ing recursive version of 6-reduction: 


(\x.x G) -G 
(\x.y G) — y where x and y are different variables 
(\x.(E F) G) — (Ax.E G) (\x.F G)) 
(\x.\y.£ G) — Qy.\x.£ G)) 
These rules may be translated straightforwardly into the de Bruijn notation, and 


written as equations, using a conditional function and equality test to combine the 


first two rules into one equation. In the de Bruijn form, occurrences of lambda 
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must be annotated with integers indicating how many other instances of lambda 
have been passed in applications of the fourth rule above. Otherwise, there would 
be no way to recognize the identity of the two instances of the same variable in the 
first rule. Initially and finally, all of these integer labels on lambdas will be 0, only 


in intermediate steps of substitution will they acquire higher values. 


- Unfortunately, the left-hand side of the third rule overlaps itself, violating res- 
triction 4 of Section 5. To avoid this overlap, we introduce a second application 
operator, JAP, to distinguish applications that are not the heads of rules. The 
third rule above is restricted to the case where E is applied to F by IAP. Since 
\x.(E F) is applied to G by the usual application operator, AP, there is no over- 
lap. Interestingly, Staples introduced essentially the same restriction in a different 
notation because, without the restriction, leftmost-outermost reduction is not 
optimal. This technique for avoiding overlap is discussed in Section 12.4. [0S84] 
develops these ideas for evaluating lambda-terms more thoroughly, but not in the 
notation of the equation interpreter. All of the observations above lead to the fol- 
lowing equations. 

Symbols 
: Constructors for lists 
cons: 2; 
nil: 0; 
: Constructors for lambda-terms 
: IAP represents an application that is known to be inert (cannot become the head 


: of a redex as the result of reductions in the subtree below). 
Lambda: 2; 


incvars: 2; 


: Arithmetical and logical operators 
if: 3; 
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add: 2; 

equ: 2; 

less: 2; 

include atomic_symbols, truth_values, integer_numerals. 


For all x, y, z, E, F, G, i, j: 


: Detect inert applications and mark them with IAP. 


(x E) = IAPIx, E] where x is either 
varly, i 
or IAPIF, G] 
or in atomic_symbols 
end or 
end where; 


:\x:i.E represents a lambda expression that has passed by i other instances of 
lambda. It is necessary to count such passings in order to recognize instances 
of the bound variable corresponding to the x above. Only active instances of 
lambda, that is, ones that are actually applied to something, are given an 


integer tag of this sort. 


(:x.E F) = (\x:0.E F) 
where x is in atomic_symbols end where; 


(y.i.varlx, j] E) = iflequli, j], E, 
ifllessli, jl, varlx, addfj, -1I], 
varlx, j]]] 
where i is in integer_numerals end where; 


(\y:i.x E) = x where x is in atomic_symbols, 
i is in integer_numerals 
end where; 


(\y:i IAPIE, F] G) = ((y-i.E G) \y:i.F G)) 


where i is in integer_numerals end where; 


Ay-i\:z.E F) = (:z.\y:addli, 11.E F)) 


where i is in integer_numerals end where; 


incvarslvarlx, il, j] = ifllessli, j], varlx, iJ, varlx, addli, LI]; 


incvars[x, i] = x where x is either in atomic_symbols 
or in integer_numerals 
or in truth_values 
or in character_strings 
end or 
end where; 
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inevarslIAPIE, Fl], i] = IAPlincvars[E, iJ, incvarsIF, iJ]: 


incvars|\x:t.E, i] = \x:incvarslt, i]. incvarslE, addli, 1]]; 


ifltrue, x, y] = x; iflfalse, x, y] = y; 


include equint, addint, lessint. 


Certain other approaches to the lambda calculus, such as the evaluation stra- 
tegy in LISP, avoid some of the notational problems associated with overlapping 
left-hand sides by using an evaluation operator. Essentially, such techniques 
reduce eval/[E] to the normal form of E, rather than reducing E itself. Such a 
solution could be programmed with equations, but it introduces two more problems, 
both of which exist in standard implementations of LISP. Notice that the outer- 
most evaluation strategy used by the equation interpreter exempted us from worry- 
ing about cases, such as (\x.y (\x. (xx)\x. (xx))), which have a normal form, but 
also an infinite sequence of reductions. Implementations of the lambda calculus 
using an evaluation operator must explicitly program leftmost-outermost evaluation, 
else they will compute infinitely on such examples without producing the normal 
form. Also, in terms, such as \x. (\y.yz), whose normal forms contain lambdas (in 
this case, the normal form is \x.z), it is very easy to neglect to evaluate the body of 
the unreduced lambda binding. Using rules that reduce lambda terms directly puts 
the onus on the equation interpreter to make sure that these rules are applied 


wherever possible. 


9.11. Lucid 


Lucid is a programming language designed by Ashcroft and Wadge [AW76, 


AW77] to mimic procedural computation with nonprocedural semantics. Early 
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attempts to construct interpreters [Ca76] and compilers [Ho78] encountered serious 
difficulties. The following set of equations, adapted from [HO82b], produce a Lucid 
interpreter directly, using the Standmath syntax of Section 4.1. A trivial Lucid 
program, itself consisting of equations, is appended to the end of the equations that 
define Lucid. Evaluation of the expression output () produces the result of running 
the Lucid program. Even though convenience and performance considerations 
require the eventual production of a hand-crafted Lucid interpreter, such as the 
one in [Ca76], the ability to define and experiment with the Lucid language in the 


simple and relatively transparent form below would certainly have been helpful in 


the early stages of Lucid development. 


: Equations for the programming language Lucid, plus a Lucid 
: program generating a list of integers. 


Symbols 


: Lucid symbols 
NOT; I; 
OR: 2; 
add: 2; 
equ: 2; 
if: 3; 
first: 1; 
next: 1; 
asa: 2; 
latest: 1; 
latestinv: 1; 


Soy: 2; 

include integer_numerals, truth_values; 
: symbols in the Lucid program 

intlist: 0; 

output: 0. 
For all W, X, Y, Z: 


: Definitions of the Lucid operators 


64 9. Miscellaneous Examples 


NOT(true) = false; 

NOT (false) = true; 

NOT (fby(W, X)) = foy(NOT (first(W)), NOT(X)); 
NOT (latest (X)) = latest(NOT(X)); 


OR(true, X) = true; 
OR(false, X) = X; 


ORG CW, X), [oy VY, Z)) = 
OR (first(W), first(Y)), OR(X, Z)); 


Bs »(W, X), latest(¥)) = 
Soy (ORGirst(W), latest(Y)), OR(X, latest(Y))); 


OR(latest(x), foy(Y, Z)) 
foy(OR (latest (X), first(¥)), OR(latest(X), Z)); 


OR(foy(W, X), false) = foy(W, X); 
OR(latest(X), latest(Y)) = latest(OR(X, Y)); 
OR(latest(X), false) = latest (X); 


if(true, Y, Z) = Y; 
if(false, Y, Z) = Z; 
if(by(W, X), Y, Z) = foy(if(frst(W), Y, Z), if(X, Y, Z); 
if(latest(X), Y, Z) = latest(if(X, Y, Z)); 
first(X) = 

where Xn is either in truth_values 

or in integer_numerals 
end or 
end where; 


first (fby(X, Y)) = first(X); 
first (latest(X)) = latest(X); 


next(X) =X 
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where X is either in truth_values 
or in integer_numerals 
end or 
end where, 


next (fby(X, Y)) = Y; 
next (latest(X)) = latest(X); 


asa(X, Y) = if(first(Y), first(X), asa(next(X), next(Y))); 


latestinv(X) = 
where X is Ais in truth values 
or in integer_numerals 
end or 
end where, 


latestinv(fby(X, Y)) = latestinv(X); 
latestinv(latest(X)) = X; 
add(fby(W, X), (Y, Z)) 

Ne UA 3, ‘frst (Y)), add(X, Z)); 


ae (W, X), latest(Y) = 
‘fby (a dd (first (W), latest(Y)), add(X, latest(Y))); 


add (latest (X), fby(Y, 
foy (add (latest(X), haa. add(latest(X), Z)); 


ae W, X), Y) = 
da (first( W), Y), add(X, Y)) 
rahe Y is in integer_numerals end where, 


add(Xx, fby(Y, Z)) 
Soy (add (x, ‘first (Y)), add(X, Z)) 
where X is in integer_numerals end where; 


add (latest (X), latest(Y)) = latest(add(X, Y)); 


add (latest(X), Y) = latest(add(x, Y)) 
where Y is in integer_numerals end where; 


—— 
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add(X, latest(Y)) = latest(add(X, Y)) 
where X is in integer_numerals end where; 


(fby(W, X), foy(Y, Z)) = 
Tautiea dD, first(Y)), equ(X, Z)); 


equ(foy(W, X), latest(Y)) = 


foy(equ(first(W), latest(¥)), equ(X, latest(Y))); 


equ(latest(X), foy(Y, Z)) = 
foy(equ(latest(X), first(Y)), equ(latest(X), Z)); 


equ (fby(W, X), Y) = 
foylequ(first(W), Y), equ(X, Y)) 
where Y is in integer_numerals end where; 


equ(X, foy(Y, Z)) = 
foy(equ(X, first(Y)), equ(X, Z)) 
where X is in integer_numerals end where; 


equ(latest(X), latest(Y)) = latest (equ(X, Y)); 


equ(latest(X), Y) = latest(equ(X, Y)) 
where Y is in integer_numerals end where; 


equ(X, latest(Y)) = latest (equ(X, Y)) 
where X is in integer_numerals end where; 


include addint, equint; 


: A trivial Lucid program 
intlistO = foy(0, add(intlistO, 1); 
outputO = fby(first(intlistO), 


(first (next (intlistO)), 
a re cna attist OD). 
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The equational program given above differs from other Lucid interpreters in 


one significant way, and two superficial ways. First and most significant, the OR 
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operator defined above is not as powerful as the OR operator in Lucid, because it 
fails to satisfy the equation OR(X,true)=true, when X cannot be evaluated to a 
truth value. The weakening of the OR operator is required by restrictions 3 and 5 
of Section 5. These restrictions will be relaxed in later versions of the equation 
interpreter, allowing a full implementation of the Lucid OR. Second, the variable 
INPUT, which in Lucid is implicitly defined to be the sequence of values in the 
input file of the Lucid program when it runs, is not definable until run time, so it 
cannot be given in the equations above. In order to mimic the input behavior of 
Lucid, the equation interpreter would have to be used with a syntactic preprocessor 
to embed given inputs within the term to be evaluated. Interactive input would 
require an interactive interface to the equation interpreter. Such an interactive 
interface does not exist in the current version, but is a likely addition in later ver- 
sions (see Section 15.3). Finally, of the many primitive arithmetic and logical 
operations of Lucid, only add, equ, OR, and if have been given above. To include 
other such operations requires duplicating the equations distributing primitive 
operations over fby and latest. With a large set of primitives, these equations 
would become unacceptably unwieldy. A truly satisfying equational implementa- 
tion of Lucid would have to encode primitive operations as nullary symbols, and 
use an application operator similar to the one in the Curry inner syntax of Section 


4.4 in order to give only one set of distributive equations. 


10. Errors, Failures, and Diagnostic Aids 


In the interest of truth in software advertising, exceptional cases in the equation 
interpreter are divided into two classes: errors and failures. Errors are definite 
mistakes on the part of the user resulting from violations of reasonable and concep- 
tually necessary constraints on processing. Failures are the fault of the inter- 
preter itself, and include exhaustion of resources and exceeding of arbitrary limits. 
Each message on an exceptional case is produced on the UNIX standard error file, 
begins with the appropriate word "Error" or "Failure", and ends with an identify- 
ing message number, intended to help in maintenance. An attempt is made to 
explain the error or failure so that the user may correct or avoid it. The eventual 
goal of the project is that the only type of failure occurring in the reduction of a 
term to normal form will be exhaustion of the total space resources. Currently, the 
interpreter will fail when presented with individual input symbols that are too long, 
but it will not fail due to overflow of a value during reduction. There are also some 
possible failures in the syntactic preprocessing and output pretty-printing steps that 
result in messages from yacc (the UNIX parser generator) rather than from the 
equational interpreter system. These failures apparently are all the result of 
overflow of some allocated space, particularly the yacc parsing stack. Occasionally, 
running of a large problem, or of too many problems simultaneously, will cause an 
overflow of some UNIX limits, such as the limit on the number of processes that 


may run concurrently. 


Because of the layered modular design of the interpreter, different sorts of 
errors may be reported at different levels of processing, and, regrettably, in slightly 
different forms. For the preprocessor (ep), the important levels are 1) context-free 


syntactic analysis, 2) context-sensitive syntactic analysis, and 3) semantic process- 
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ing. For the interpreter (ei), only levels 1 and 3 are relevant. Sections 10.1 


through 10.3 describe the sorts of messages produced at each of these levels. 


10.1. Context-Free Syntactic Errors and Failures 


Context-free syntactic errors in preprocessor input may involve the general syntax 
of definitions, described in Section 3, or one of the specific syntaxes for terms 
described in Section 4. Context-free errors in interpreter input may only involve a 
specific term syntax. Error messages relating to a specific term syntax always 
include the name of the syntax being used. Error detection is based on the parsing 
strategy used by yacc [Jo78]. Each error message includes a statement of the syn- 
tactic restriction most likely to cause that sort of parsing failure. The parser makes 
no attempt to recover from an error, so only the first syntactic error is likely to be 
reported. It is possible that an error in a term will be detected as an error in the 
general syntax of definitions, and vice versa. Error messages are particularly 
opaque when the wrong syntactic preprocessor was loaded by the last invocation of 
loadsyntax, so the user should always pay attention to the name of the syntax in 
use. Yacc failures are possible in the syntactic preprocessing, either from parser 


stack overflow, or from an individual symbol being too long. 


10.2. Context-Sensitive Syntactic Errors and Failures 


Context-sensitive errors are only relevant to preprocessor input. They all involve 
inconsistent use of symbols. The three types of misuse are: 1) repeated declara- 
tion of the same symbol; 2) use of a declared symbol with the wrong arity; 3) 
attempt to include a class of symbols or equations that does not exist; 4) repetition 
of a variable symbol on the left-hand side of an equation; 5) appearance of a vari- 


able on the right-hand side of an equation that does not appear on the left. 
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Context-sensitive syntactic preprocessing may fail due to exhaustion of space 
resources, or to an individual symbol being too long for the current version. The 
second sort of failure will be avoided in later versions. In order to produce a lexi- 
con presenting all of the symbols used in an equational program, see Section 10.4 


below. 


10.3. Semantic Errors and Failures 


The only semantic failure in the interpreter is exhaustion of total space resource. 
Other semantic errors and failures are only relevant to preprocessor input. The 
simplest such error is use of a symbol from one of the classes integer_numerals, 
atomic_symbols, truth_values, or characters, without a declaration of that 
class. In future versions, these errors will be classified as context-sensitive syn- 
tactic errors. The more interesting errors are violations of restrictions 3, 4, and 5 
from Section 5. Violations of these restrictions always involve nontrivial overlay- 
ings of parts of left-hand sides of equations. In addition to describing which res- 
triction was violated, and naming the violating equations, the preprocessor tries to 
report the location of the overlap by naming the critical symbol involved. This is 
probably the weakest part of the error reporting, and future versions will try to 
provide more graphic reports for semantic errors. Notice that restriction 5 (left- 
sequentiality) will be removed in later versions. To specify the offending equations, 
the preprocessor numbers all equations, including predefined classes (counting 1 for 
each class), and reports equations by number. In order to be sure of the number- 
ing used by the preprocessor, and in order to get a more graphic view of the terms 
in the tree representation used by the preprocessor, the user should see Section 10.5 


below. 


10.4. Producing a Lexicon val 


10.4. Producing a Lexicon to Detect Inappropriate Uses of Symbols (e/) 
After executing 
ep Equnsdir 
the user may produce a lexicon listing in separate categories 
1) all declared literal symbols 
2) all declared literal symbols not appearing in equations 
3) all atomic symbols appearing in equations 
4) all characters appearing in equations 
5) all truth values appearing in equations. 
Empty categories are omitted, and symbols within a category are given in alphabet- 
ical order. A lexicon is produced on the standard output by typing 
el Equnsdir 


el stands for equational lexicon. The lexicon is intended to be used to discover 
accidental misspellings and omissions that may cause a symbol to belong to a 
category other than the one intended. Each lexicon is headed by the date and time 
of the last invocation of ep. Changes to definitions after the given date and time 


will not be reflected in the lexicon. 


10.5. Producing a Graphic Display of Equations In Tree Form (es) 


In order to understand the semantic errors described in Section 10.3, it is useful to 
see a set of equations in the same form that the preprocessor sees. Not only is this 
internal form tree-structured, rather than linear, but there may be literal symbols 
appearing in the internal form that are only implicit in the given definitions, such 


as the symbol cons, which appears implicitly in the LISP.M expression (a b c). 
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The user may also use the tree-structured form of the terms in his equations to ver- 
ify that the matching of parentheses and brackets in his definitions agrees with his 
original intent. To generate a tree-structured display of equations on the standard 


output, type 


es Equnsdir 


es stands for equation show. Unfortunately, the more mnemonic abbreviations are 
already used for other commands. es may only be used after running ep on the 
same directory. The output from es lists the equations in the order given by the 
user, with the sequential numbers used in error and failure reports from the prepro- 
cessor. Each term in an equation is displayed by listing the symbols in the term in 
preorder, and using indentation to indicate the tree structure. Variables on the 
left-hand sides of equations are replaced by descriptions of their ranges, in pointed 
brackets (<>), and variables on the right-hand sides are replaced by the 
addresses of the corresponding variables on the left-hand sides. Representations of 
predefined classes of equations are displayed, as well as equations given explicitly 
by the user. For example, the following definitions 

Symbols 


fe 2: 
h: 1; 


include atomic_symbols. 
For all x, y, z: 


S(g(x, y), a) = h(y) where x is in atomic_symbols end where; 


include equatom. 


produce the listing: 
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Listing of equational definitions processed on Apr 19 at 15:43 


I: 
a 


<atomic_symbol> 
<anything> 
a 


variable I 2 
2 
equ 
<atomic_symbol> 
<atomic_symbol> 


e 
variable | 
variable 2 


Notice that, on the left-hand side of equation 1, the variable x is replaced by 
<atomic_symbol>, and the variable y is replaced by <anything>, representing 
the fact that any term may be substituted for y. On the right-hand side, y is 


replaced by 


variable 1 2 


indicating that the corresponding y on the left-hand side is the 2nd son of the Ist 
son of the root of the term. The date and time at the top refer to the time of invo- 
cation of ep. The user should check that this time agrees with his memory. 


Changes to definitions after the given date and time are not reflected in the display. 


10.6. Trace Output (et) 
A primitive form of trace output is available, which displays for each reduction 
step the starting term, the redex, the number of the equational rule applied, and 


the reductum. In order to produce trace output, invoke the equation interpreter 


14 10. Errors, Failures, Diagnostics 


with the option ¢ as 
ei Equnsdir t <input 


where Equnsdir is the directory containing the equational definitions, and input is 
a file containing the term to be reduced. Since ei uses purely positional notation 
for its parameters, Equnsdir may not be omitted. The invocation of ei above pro- 
duces a file Equnsdir/trace.inter containing a complete trace of the reduction of 


the input term to normal form. To view the trace output on the screen, type 

et Equnsdir 
(Equnsdir defaults to .). et stands for equational trace. The trace listing is 
headed by the date and time of the invocation of ei resulting in that trace. The 
user should check that the given time agrees with his memory. 


10.7. Miscellaneous Restrictions 


Literal symbols are limited to arities no greater than 10, and all symbols are lim- 


ited to lengths no greater than 20 in the current version. 


11. History of the Equation Interpreter Project 


The theoretical foundations of the project come from the dissertation "Reduction 
Strategies in Subtree Replacement Systems," presented by Michael O’Donnell at 
Cornell University in 1976. The same material is available in the monograph Com- 
puting in Systems Described by Equations [O’D77]. There, the fundamental res- 
trictions 1-4 on the left-hand sides of equations in Section 5 were presented, and 
shown to be sufficient for guaranteeing uniqueness of normal forms. In addition, 
outermost reduction strategies were shown to terminate whenever possible, and con- 
ditions were given for the sufficiency of leftmost-outermost reductions. A proof of 
optimality for a class of reduction strategies was claimed there, but shown incorrect 
by Berry and Lévy [BL79, O’D79]. Huet and Lévy later gave a correct treatment 
of essentially the same optimality issue [HL79]. 

In the theoretical monograph cited above, O’Donnell asserted that "a good 
programmer should be able to design efficient implementations of the abstract com- 
putations" described and studied in the monograph. In 1978, Christoph Hoffmann 
and O’Donnell decided to demonstrate that such an implementation is feasible and 
valuable. The original intent was to use the equations for formal specifications of 
interpreters for a nonprocedural programming languages. For example, the equa- 
tions that McCarthy gave to define LISP [McC60] could be given, and the equa- 
tion processor should automatically produce a LISP interpreter exactly faithful to 
those specifications. Preliminary experience indicated that such applications were 
severely handicapped in performance. On the other hand, when essentially the 
same computation was defined directly by a set of equations, the equation inter- 
preter was reasonably competitive with conventional LISP. So, the emphasis of the 


project changed from interpreter generation to programming directly with equa- 
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tions. 


From early experience, the project goal became the production of a usable 
interpreter of equations with very strict adherence to the semantics given in Section 
1, and performance reasonably competitive with conventional LISP interpreters. 
The specification of such an interpreter was given in [HO82b], and the key imple- 
mentation problems were discussed there. Since the natural way of defining a sin- 
gle function might involve a large number of equations, the second goal requires 
that the interpreter have little or no runtime penalty for the number of equations 
given. Thus, sequential checking for applicability of the first equation, then the 
second, etc. was ruled out, and pattern matching in trees was identified as the key 
algorithmic problem for the project. The overhead of pattern matching appears to 
be the aspect of the interpreter that must compete with the rather slight overhead 
of maintaining the recursion stack in LISP. Some promising algorithms for tree 


pattern matching were developed in [HO82a]. 


In 1979 Giovanni Sacco, a graduate student, produced the first experimentally 
usable version of the interpreter in CDC Pascal, and introduced some table- 
compression techniques which, without affecting the theoretical worst case for pat- 
tern matching, improved performance substantially on example problems. 
Hoffmann ported Sacco’s implementation to the Siemens computer at the Univer- 
sity of Kiel, Germany in 1980. Hoffmann and O’Donnell used Sacco’s implementa- 
tion for informal experiments with LISP, the Combinator Calculus, and the 
Lambda Calculus. These experiments led to the decision to emphasize program- 
ming with equations over interpreter generation. These experiments also demon- 
strated the inadequacy of any single notation for all problems, and motivated the 


library of syntaxes provided by the current version. Another graduate student, 
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Paul Golick, transferred the implementation to UNIX on the VAX, and rewrote 
the run-time portion of the interpreter (ei in the current version) in 1980. During 
1982 and Spring of 1983, O’Donnell took over the implementation effort and pro- 
duced the current version of the system. The final year of work involved informal 
experiments with three different pattern matching techniques, and reconfiguration 


of the implementation to allow easy substitution of different concrete syntaxes. 


Experience with the interpreter comes from the interpreter implementation 
itself, from two projects done in the advanced compiler course at Purdue Univer- 
sity, and form a course in Logic Programming at the Johns Hopkins University. 
O’Donnell used the equation interpreter to define the non-context-free syntactic 
analysis for itself, gaining useful informal experience in the applicability of the 
interpreter to syntactic problems. In 1982, Hoffmann supervised a class project 
which installed another experimental pattern-matching algorithm in the interpreter, 
and used the equation interpreter to define a Pascal interpreter. In 1983, 
Hoffmann supervised another class project using the equation interpreter to define 
type checking in a Pascal compiler. These two projects generated more information 
on the suitability of various pattern-matching algorithms, and on the applicability 
of equations to programming language problems. In 1983, O’Donnell assigned stu- 
dents in a Logic Programming course to a number of smaller projects in equational 
programming. One of these projects found the first natural example of a theoreti- 


cal combinatorial explosion in one of the pattern-matching algorithms. 
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Compared to the syntactic restrictions of conventional languages like PASCAL, the 
restrictions on equations described in Section 5 are a bit subtle. We believe that 
the additional study needed to understand the restrictions is justified for several 
reasons. First, the restrictions are similar in flavor to those imposed by determinis- 
tic parsing strategies such as LR and LALR, and perhaps even a bit simpler. The 
trouble taken to satisfy the restrictions is rewarded by the guarantee that the 
resulting program produces the same result, independently of the order of evalua- 
tion. This reward should become very significant on parallel hardware of the 
future, where the trouble of insuring order-independence in a procedural program 
may be immense. Finally, there are disciplined styles of programming with equa- 
tions that can avoid errors, and techniques for correcting the errors when they 
occur. A few such techniques are given in this section; we anticipate that a sizable 


collection will result from a few years of experience. 


12.1. A Disciplined Programming Style Based on Constructor Functions 

In many applications of equational programming, the function symbols may be par- 
titioned into two classes: 

1. constructor symbols, used to build up static data objects, and 

2. defined symbols, used to perform computations on the data objects. 

For example, in LISP M-expressions, the atomic symbols, nil, and the binary sym- 
bol cons, are constructors, and all metafunction symbols are defined symbols. 
Technically, a constructor is a symbol that never appears as the outermost symbol 
on the left-hand side of an equation, and a defined symbol is one that does appear 


as the outermost symbol on a left-hand side. The constructor discipline consists of 
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never allowing a defined symbol to appear on a left-hand side, except as the outer- 
most symbol. An equational program that respects the constructor discipline 


clearly satisfies the nonoverlapping restriction 4 of Section 5. 


Example 12.1.1 

The following set of equations, in standard mathematical notation, does not respect 
the constructor discipline, although it does not contain an overlap. 

Symbols 


fl; 
g: 2; 


include atomic_symbols. 
For all x: 

Sea, x)) = gla, f(x); 

g(b, x) =a. 


The symbol g is a defined symbol, because it appears outermost on the left-hand 
side of the second equation, but g also appears in a nonoutermost position on the 
left-hand side of the first equation. On the other hand, the following set of equa- 
tions accomplishes the same result, but respects the constructor discipline. 
For all x: 

SAG) = gla, f)); 

g(a, x) = hG); 

g(b, x) =a. 


Here, h, a, and b are constructors, f and g are defined symbols. Neither f nor g 
appears on a left-hand side except as the outermost symbol. The occurrences of f 


and g on the right-hand sides are irrelevant to the constructor discipline. 


0 
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The constructor discipline avoids violations of the nonoverlapping restriction 4, but 
it does not prevent violations of restriction 3, which prohibits two different left- 
hand sides from matching the same term. For example, f(a,x)=a and f(x,a)=a 
violate restriction 3, although the defined symbol f does not appear on a left-hand 


side except in the outermost position. 


"When the constructor discipline is applied, the appearance of a defined symbol 
in a normal form is usually taken to indicate an error, either in the equations or in 
the input term. Many other research projects, particularly in the area of abstract 
data types, require the constructor discipline, and sometimes require that defined 
symbols do not appear in normal forms [GH78]. The latter requirement is often 


called sufficient completeness. 


It is possible to translate every equational program satisfying restrictions 1-4 

(i.e., the regular term reduction systems) into an equational program that respects 
the constructor discipline. The idea, described by Satish Thatte in [Th85], is to 
create two versions, f and f’, of each defined symbol that appears in a nonouter- 
most position on a left-hand side. f remains a defined symbol, while f’ becomes a 
constructor. Every offending occurrence of f (i.e., nonoutermost on a left-hand 
side) is replaced by f’. In addition, equations are added to transform every f 
that heads a subterm not matching a left-hand side into f". 
Example 12.1.2 
Applying the procedure described above to the first equational program in Example 
12.1.1 yields the following program. 
For all x: 

S(g'(a,x)) = gla, f(x); 

gla, x) = g'(a, x); 
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glb, x) =a. 


0 
In the worst case, this procedure could increase the size of the program quadrati- 
cally, although worst cases do not seem to arise naturally. At any rate, the con- 
structor discipline should probably be enforced by the programmer as he programs, 
rather than added on to a given program. In Section 12.4 we show how to use a 
similar procedure to eliminate overlaps. 

The constructor discipline is rather sensitive to the syntactic form that is actu- 
ally used by the equation interpreter. 
Example 12.1.3 
Consider the following program, given in Lambda notation. 


Symbols 
AP: 2; 


include atomic_symbols. 


For all x, y: 
(REFLECT (CONS x y)) = (CONS (REFLECT x) (REFLECT y)); 


(REFLECT x) = x where x is in atomic_symbols end where. 
Recall that this is equivalent to the standard mathematical notation: 
For all x, y: 


AP(REFLECT, AP(AP(CONS, x), y)) = 
AP(AP(CONS, AP(REFLECT, x)), AP(REFLECT, y)); 


AP(REFLECT, x) = x where x is in atomic_symbols end where. 
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This program does not respect the constructor discipline, as the defined symbol AP 
appears twice in nonoutermost positions in the left-hand side of the first equation. 
As long as no inputs will contain the symbols REFLECT or CONS except applied 
(using AP) to precisely one or two arguments, respectively, the same results may 
be obtained by the following un-Curried program in standard mathematical nota- 
tion. 

For all x, y: 

REFLECT(CONS (x, y)) = CONS(REFLECT(x), REFLECT(y)); 


REFLECT(x) = x where x is in atomic_symbols end where. 


The last program respects the constructor discipline. 


0 

Example 12.1.4 

Weak reduction in the combinator calculus [Sc24, St72] may be programmed in 
Lambda notation as follows. 

Symbols 


AP: 2; 
include atomic_symbols. 


For all x, y, z: 
(Sxy2=(xz 2); 
(Kx) =x. 


As in the first program of example 12.2.1, the constructor discipline does not hold, 
because of the implicit occurrences of the defined symbol AP in nonoutermost posi- 


tions of the first left-hand side. The left-hand sides may be un-Curried to 
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S(x, y, z) 
K(x) 


The latter program respects the constructor discipline, with S and K being defined 
symbols, and no constructors mentioned in the left-hand sides. The right-hand 
sides cannot be meaningfully un-Curried, without extending the notation to allow 
variables standing for functions. 
0 

One is tempted to take a symbol in Curried notation as a defined symbol when 
it appears leftmost on a left-hand side of an equation. Unfortunately, this natural 
attempt to extend the constructor discipline systematically to Curried notation fails 
to guarantee the nonoverlapping property. 
Example 12.1.5 
In the following program, given in Lambda notation, the symbol P appears only 
leftmost in left-hand sides of equations. 


Symbols 
AP: 2; 


include atomic_symbols. 
For all x, y, z: 

(Px y) =Q; 

(Pxya=R. 


The two left-hand sides overlap, however, and (PQQQ) has the two different nor- 
mal forms QQ and R. 

0 

Informally, the overlap violation above appears to translate to a violation of restric- 


tion 3 in an un-Curried notation. Formalization of this observation would require a 
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treatment of function symbols with varying arities. The appropriate formalization 
for this case is not hard to construct, but other useful syntactic transformations 
besides Currying may arise, and might require totally different formalisms to relate 


them to the constructor discipline. 


Because of the sensitivity of the constructor discipline to syntactic assump- 
tions, and because the enforcement of this discipline may lead to longer and less 
clear equational programs, the equation interpreter does not enforce such a discip- 
line. Whenever a particular problem lends itself to a solution respecting the con- 
structor discipline, we recommend that the programmer enforce it on himself, and 
document the distinction between constructors and defined symbols. So far, most 
examples of equational programs that have been run on the interpreter have 
respected the constructor discipline, and the examples of nonoverlapping equations 
not based on constructors have been few, and often hard to construct. So, experi- 
ence to date fails to give strong support for the utility of the greater generality of 
nonoverlapping equations. We expect that future versions of the interpreter will 
enforce even weaker restrictions, based on the Knuth-Bendix closure algorithm 
[KB70], and that substantial examples of programs requiring this extra generality 
will arise. Further research is required to adapt the Knuth-Bendix procedure, 
which was designed for reduction systems in which every term has a normal form, 


to nonterminating systems. 


12.2. Simulation of LISP Conditionals 


The effort expended in designing and implementing the equation interpreter would 
be wasted if the result were merely a syntactic variant of LISP. For the many 
problems, and portions of problems, however, for which LISP-style programming is 


appropriate, a programmer may benefit from learning how to apply an analogous 
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style to equational programming. The paradigm of LISP programming is the 
recursive definition of a function, based on a conditional expression. The general 


form, presented in informal notation, looks like 


Six] = if P,[x] then E, 
else if Pax] then Ez 


else if P,[x] then E, 


else En+1 


where P,[x] is usually "x is nil", or occasionally "x is atomic", and P,--- ,P, 
require more and more structure for x. In order to program the same computation 
for the equation interpreter, each line of the conditional is expressed as a separate 
equation. The conditions P,,--~-,P, are expressed implicitly in the structure of 
the arguments to f on the left-hand sides of the equations, and occasionally in syn- 
tactic restrictions on the variables. Since there is no order for the equations, the 
effect of the order of conditional clauses must be produced by letting each condi- 
tion include the negation of all previous conditions. As long as the conditions deal 
only with the structure of the argument, rather than computed qualities of its 
value, this translation will produce a more readable form than LISP syntax, and 
the incorporation of negations of previous conditions will not require expansion of 
the size of the program. The else clause must be translated into an equation that 
applies precisely in those cases where no other condition holds. Expressing this 
condition explicitly involves some extra trouble for the programmer, but has the 
benefit of clarifying the case analysis, and illuminating omissions that might be 


more easily overlooked in the conditional form. If the programmer accidentally 
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provides two equations that could apply in the same case, the interpreter detects a 
violation of restriction 3. If he neglects to cover some case, the first time that a 
program execution encounters such a case, the offending application of f will 


appear unreduced in the output, displaying the omission very clearly. 


Example 12.2.1 
Consider the following informal definition of a function that flattens a binary tree 
into a long right branch, with the same atomic symbols hanging off in the same 


order. 


flatlk] = = ifxisatomic  thenx 
else if car[x] is atomic then conslcarlx]; flatlcdr[x]]] 
else flatlcons{car[car[xI]; cons[cdrlcar[x]]; cdrfxIII] 


The actual LISP program, using the usual abbreviations for compositions of car 


and cdr, follows. 


(DEF "(FLAT (LAMBDA (xX) 
(COND 
((ATOM X) Xx) 
((ATOM (CAR X)) (CONS (CAR X) (FLAT (CDR X)))) 
eG (FLAT (CONS (CAAR X) (CONS (CDAR X) (CDR X))))) 


The same computation is described by the following equational program, using 
LISP.M notation. 
Symbols 

flat: 1; 

cons: 2; 

nil: 0; 

include atomic_symbols. 


For all x: 
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flat[x] = x where x is in atomic_symbols end where; 
flatl(x . y)] = (x . flatly) where x is in atomic_symbols end where; 
flatl((x . y). 2)] = flatlx . Gy. 2). 


0 


When conditions in a LISP program refer to predicates that must actually be com- 
puted from the arguments to a function, rather than to the structure of the argu- 
ments, the programmer must use a corresponding conditional function in the equa- 
tional program. The translation from LISP syntax for a conditional function to the 


LISP.M notation for the equation interpreter is utterly trivial. 


12.3. Two Approaches to Errors and Exceptional Conditions 


The equation interpreter has no built-in concept of a run-time error. There are 
failures at run-time, due to insufficient space resources, but no exceptional condi- 
tion caused by application of a function to inappropriate arguments is detected. 
We designed the interpreter this way, not because we believe that run-time error 
detection is undesirable, but rather because it is completely separated from the 
other fundamental implementation issues. We decided to provide an environment 
in which different ways of handling errors and exceptions may be tried, rather than 
committing to a particular one. If errors are only reported to the outside world, 
then the reporting mechanism is properly one for a syntactic 
postprocessor. Certain normal forms, such as car[Q] in LISP, are perfectly 
acceptable to the equation interpreter, but might be reported as errors when 
detected by a postprocessor. Total support of this approach to errors may require a 
mechanism for halting evaluation before a normal form is reached, but that 


mechanism will be provided in future versions as a general augmentation of the 
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interface (see Section 15.3), allowing the interpreter to act in parallel with other 


components. No specific effort should be required for error detection. 


If a programmer wishes to detect and react to errors and exceptional condi- 
tions within an equational program, two basic strategies suggest themselves. In the 
first strategy, exception detection is provided by special functions that inspect a 
structure to determine that it may be manipulated in a certain way, before that 
manipulation is attempted. In the other strategy, special symbols are defined to 
represent erroneous conditions. Reaction to exceptions is programmed by the way 
these special symbols propagate through an evaluation. We chose to provide a set- 
ting in which many strategies can be tested, rather than preferring one. 


Example 12.3.1 


Consider a table of name-value pairs, implemented as a list of ordered pairs. The 
table is intended to represent a function from some name space to values, but occa- 
sionally certain names may be accidentally omitted, or entered more than once. In 
a program for a function to look up the value associated with a given name, it may 
be necessary to check that the name occurs precisely once. The following programs 
all use LISP.M notation. The first applies the strategy of providing checking func- 


tions. Efficiency has been ignored in the interest cf clarifying the fundamental 


issue. 
Symbols 


cons: 2; 
nil: 0; 
occurs: 2; 
legmap: 2; 
lookup: 2; 
add: 2; 
equ: 2; 

if: 3; 


include atomic_symbols, integer_numerals, truth_values. 
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For all m, n, 1, v: 
occursim; O] = 0; 


occursim; (n. v) . D] = 
iflequlm; nl; addloccurstm; U; 1]; occurslm; II; 


legmaplm,; I] = equloccurslm; U; 1; 


ae (n.w.D] = 
iflequlm; nl; v; lookuplm; UI; 


ifltrue; m:n] =m; — iflfalse; m; n] = 1; 


include equint, addint, equatom. 


When lookup[m;/] is used in a larger program, the programmer will have to test 
legmap[m;/] first, if there is any chance that m is not associated uniquely in /. 
Essentially the same facility is provided by the following program, which applies 
the strategy of producing special symbols to represent errors. 
Symbols 

cons: 2; 

nil: 0; 

lookup: 2; 

undefined, overdefined: 0; 


if: 3; 
equ: 2; 


include atomic_symbols, truth_values. 
For all m, n, I, v: 
lookuplm; O] = undefined[]; 
fooleuploas (a.w. DJ = 
iflequlm; 
‘Aleqallookn lm; U; undefined[]]; v; overdefinedll]; 
lookuplm; if: 


ifltrue; m; n] = m; iflfalse; m; n] =n; 


include equatom. 
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0 


Either strategy may be adapted to produce more or less information about the pre- 
cise form of an error or exceptional occurrence. There appears to be no technical 
reason to prefer one to the other. The choice must depend on the programmer’s 


taste. It is probably foolish to mix the two strategies within one program. 


12.4. Repairing Overlaps and Nonsequential Constructs 


When a set of logically correct equations is rejected by the equation interpreter 
because of overlapping left-hand sides, there are two general techniques that may 
succeed in removing the overlap, without starting from scratch. The first, and sim- 
plest, is to generalize the equation whose left-hand side applies outermost in the 
offending case, so that the overlap no longer involves explicitly-given symbols, but 
only an instance of a variable. 

Example 12.4.1 

Consider lists of elements in LISP.M notation, where a special element missing, 
different from nil, is to be ignored whenever it appears as a member of a list. 
Such a missing element would allow recursive deletion of elements from a list in 
an especially simple way. The equation defining the behavior of missing may 
easily overlap with other equations. 

Symbols 


cons: 2; 
nil: 0; 


missing: 0; 


: mirror[l] is | concatenated with its own reversal. 
mirror: 1; 


include atomic_symbols. 
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For all 1, ll, x: 
(missing. D = 1; 
mirrorll] = appendll; reverselt]] where | is either 0 
or (x ..11) 


end or 
end where; 


The two equations in the fragment above overlap in the term 
mirror((missing[] . 1)] 


This overlap may be avoided by deleting the where clause on the second equation, 
and allowing the mirror operation to apply to elements other than lists, perhaps 
with meaningless results. Of course, the first equation will certainly overlap with 
other equations defining, for example, reverse and append, and these overlaps will 
require other avoidance techniques. 
0 

The technique of generalization, shown above, only works when unnecessarily 
restrictive equations have led to overlap. More often, overlaps may be removed by 
restricting the equation whose left-hand side appears innermost in the offending 
case, either by giving more structure to the left-hand side, or by adding extra sym- 
bols to differentiate different instances of the same operation. The second method 
seems to be unavoidable in some cases, but it regrettably decreases the readability 
and generality of the program. 
Example 12.4.2 
Consider an unusual setting for list manipulation, based on an associative concate- 
nation operator cat, instead of the constructor cons of LISP. An atomic symbol a 


is identified with the singleton list containing only a. The following program, given 
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in standard mathematical notation, enforces the associativity of cat by always asso- 
ciating to the right, and defines reversal as well. 
Symbols 
cat: 2; 
reverse: 1; 
include atomic_symbols. 
For all x, y, z: 


cat(cat(x, y), z) = cat(x, cat(y, z)) 
where x is in atomic_symbols end where; 


reverse(cat(x, y)) = cat(reverse(y), reverse(x)); 


reverse(x) = x where x is in atomic_symbols end where. 


The restriction of x in the first equation to be an atomic symbol prevents a self- 
overlap of the form cat (cat (cat(A,B),C),D), but there is still an overlap between 
the first and second equations in the form reverse (cat (cat(A,B),C)). The same 
effect may be achieved, without overlap, by restricting the variable x to atomic 
symbols in the second equation as well. This correction achieves the right output, 
but incurs a quadratic cost for reverse, because of the reassociation of cats implied 
by it. See Section 16.1 for a more thorough development of this novel approach to 
list manipulation, achieving the linear reversal cost that was probably intended by 


the program above. 


0 


Example 12.4.3 
In order to approach an implementation of the lambda calculus, one might take the 
Curried notation in which binary application is the only operation, and add a syn- 


tactic substitution operator. The following program defines substitution, and for 
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the sake of simplicity only the identity function, using standard mathematical nota- 
tion. subst(x,y,z) is intended to denote the result of substituting x for each 
occurrence of y in z. 


Symbols 


include atomic_symbols, truth_values. 
For all w, x, y, 2: 
AP(IO, x) = x; 


subst(w, x, y) = iflequ(x, y), w, y) 
where y is in atomic_symbols end where; 


subst (w, x, AP(y, 2)) = AP(subst(w, x, y), subst(w, x, 2)); 
if(true, x, y) = x; if(false, x, y) = false; 


include equatom. 


The first and third equations overlap in the form subst(A,B,AP(I,C)). In order 
to avoid this overlap, change nonoutermost occurrences of the symbol AP on left- 
hand sides (there is only one in this example, in the third equation) into a new 
symbol, JAP (Inert APplication). Two additional equations are required to con- 
vert AP to JAP when appropriate. 
For all w, x, y, z: 

AP(IO, x) = x; 


subst(w, x, y) = iflequ(x, y), w, y) 
where y is in atomic_symbols end where; 


subst(w, x, IAP(y, z)) = AP(subst(w, x, y), subst(w, x, z)); 
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AP(x, y) = IAP(x, y) where x is in atomic_symbols end where; 
AP(IAP(x, y)) = IAP(IAP(x, y)); 
if(true, x, y) = x; if(false, x, y) = false; 


include equatom. 


Notice that JAP is never used on the right-hand side. Essentially, the use of [AP 


enforces innermost evaluation in those cases where left-hand sides used to overlap. 


The technique of adding symbols to avoid overlap, illustrated in Example 12.4.3, is 
essentially the same idea as that used by Thatte [Th85] to translate nonoverlapping 
equations into the constructor discipline (see Section 12.1). Example 16.3.3 shows 
how the same technique was used independently by Hoffmann, who thought of 
overlap as a potential inconsistency in a concurrent program, and used locks to 


avoid the inconsistency. 


When a logically correct set of equations is rejected by the equation inter- 
preter because of a failure of left-sequentiality, the first thing to try is reordering 
the arguments to offending functions. If the program is part of a polished product, 
the reordering may be accomplished in syntactic pre- and postprocessors. If the 
trouble of modifying the syntactic processors is too great, the user, regrettably, 


must get accustomed to seeing the arguments in the new order. 


Example 12.4.4 

Consider a program generalizing the LISP functions car and cdr to allow an arbi- 
trarily long path to a selected subtree. select[t;p] is intended to select the subtree 
of t reached by the path p from the root of t. Paths are presented as lists of 


atomic symbols L and R, representing left and right branches, respectively. 
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Symbols 


cons: 2; 
nil: 0; 


select: 2; 
include atomic_symbols. 
For all x, y, p: 
selectlx; 0] = x; 
selectl(x . y); (L. p)] = selectlx; pl; 
selectl(x . y); (R.. p)] = selectly; pl. 


Left-sequentiality fails for the program above because, after seeing the symbol 
select, there is no way to decide whether or not to inspect the first argument, seek- 
ing a cons, or the second argument, seeking (). Only after seeing the second argu- 
ment, and determining whether or not it is Q, can the interpreter know whether or 
not the first argument is relevant. Left-sequentiality is restored merely by revers- 
ing the arguments to select. 
For all x, y, p: 

select[0; x] = x; 

selectI(L . p); (x . y)] = selectlp; x]; 

selectI(R . p); (x . y] = selectlp; yl. 


0 

Failures of left-sequentiality may also be repaired by artificially forcing the 
interpreter to inspect an argument position, by replacing the variable with a dis- 
junction of all possible forms substituted for that variable. This technique degrades 


the clarity of the program, and risks the omission of some possible form. In the 
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worst case, the forced inspection of an ill-defined argument could lead to an 
unnecessary infinite computation. 
Example 12.4.5 
The first program of Example 12.4.4 may also be repaired by replacing the first 
equation, 

selectlx; 0] = x; 


with the two equations, 


select[x; 0] = x where x is in atomic_symbols end where; 


selectl(x . y); O] = (x. y); 


or, equivalently, by qualifying the variable x to be 


either (y . z) or in atomic_symbols end or 


0 


Whenever permutation of arguments restores left-sequentiality, that method is pre- 
ferred to the forced inspection. In some cases, however, argument permutation 
fails where forced inspection succeeds. 
Example 12.4.6 
The parallel or function may be defined by 
Symbols 
or: 2; 
include truth_values. 
For all x: 
or(true, x) = true; 


or(x, true) = true; 
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or(false, false) = false. 


Left-sequentiality fails because of the first two equations. If the second equation is 
changed to 
or(false, true) = true; 


then left-sequentiality is restored. The or operator in the modified equations is 
sometimes called the conditional or. The results are the same as long as evalua- 
tion of the first argument to or terminates, but arbitrarily much effort may be 
wasted evaluating the first argument when the second is true. In the worst case, 


evaluation of the first argument might never terminate, so the final answer of true 


would never be discovered. 


0 


13. Use of Equations for Syntactic Manipulations 


While computer programming in general remains something of a black art, certain 
special problem areas have been reduced to a disciplined state where a competent 
person who has learned the right techniques may be confident of success. In par- 
ticular, the use of finite-state lexical analyzers and push-down parses, generated 
automatically from regular expressions and context-free grammars, has reduced the 
syntactic analysis of programming languages, and other artificially designed 
languages, to a reliable discipline [AU72]. The grammatical approach to syntactic 
analysis is also beneficial because the grammatical notation is reasonably self- 
documenting - sufficiently so that the same grammar may be used as input to an 
automatic parser generator, and as appendix to a programmer’s manual, providing 
a reliable standard of reference to settle subtle syntactic issues that are not 
explained sufficiently in the text of the manual. Beyond context-free manipula- 
tions, the best-known contender for formal generation of language processors is the 
attribute grammar [Kn68, AU72]. An attribute grammar is a context-free gram- 
mar in which each nonterminal symbol may have attribute values associated with 
it, and each production is augmented with a description of how attributes of non- 
terminals in that production may be computed from one another. Although they 
have proved very useful in producing countless compilers, interpreters, and other 
language processors, attribute grammars have not provided a transparency of nota- 
tion comparable to that of context-free grammars, especially when, as is usually the 
case, actual computation of one attribute from another is described in a conven- 


tional programming language, such as C. 


Linguists who seek formal descriptions of natural languages have tried to 


boost the power of context-free grammars with transformational grammars 
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[Ch65]. A transformational grammar, like an attribute grammar, contains a 
context-free grammar, but the context-free grammar is used to produce a tree, 
which is then transformed by schematically presented transformation rules. The 
parse tree produced by the context-free grammar is called the surface structure, 
and the result of the transformations is called the deep structure of an input string 
of symbols. A number of different formal definitions of transformational grammar 
have been proposed, all of them suffering from complex mechanisms for controlling 
the way in which tree transformations are applied. We propose the equation inter- 
preter as a suitable mechanism for the transformational portion of transformational 
grammars. By enforcing the confluence property, the equation interpreter finesses 
the complex control mechanisms, and returns to a notation that has the potential 
for self-documenting qualities analogous to those of context-free grammars. The 
concepts of surface structure, which captures the syntactic structure of the source 
text, without trivial lexical details, and deep structure, which is still essentially syn- 
tactic, but which captures the structure of the syntactic concepts described by the 
source text, rather than the structure of the text itself, appear to be very useful 
ones in the methodology of automatic syntactic analysis. The analysis into surface 
and deep structures should be viewed as a refinement of the idea of abstract syn- 
tax - syntax in tree form freed of purely lexical issues - which has already proved a 


useful organizing concept for language design [McC62, La65]. 


In this section, we propose that the concept of abstract syntax be made an 
explicit part of the implementation of syntactic processors, as well as a design con- 
cept. Rather than extend the traditional formalisms for context-free grammars, we 
develop a slightly different notation, clearly as strong in its power to define sets of 


strings, and better suited to the larger context of regular/context-free/equational 
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processing. 


13.1 An Improved Notation for Context-Free Grammars 


The tremendous success of context-free grammars as understandable notations for 
syntactic structure comes from their close connection to natural concepts of type 
structure. By making the connection even closer, we believe that clarity can be 
improved, and deeper structural issues separated more cleanly from superficial lexi- 
cal ones. The first step is to start, not with source text, which is a concrete realiza- 
tion of a conceptual structure, but with the conceptual structure itself. By "concep- 
tual structure," we mean what a number of programming language designers have 
called abstract syntax [McC62, La65], and what Curry and Feys called formal 
objects or Obs [CF58]. We do not intend meanings or semantic structures: rather 
abstract, but syntactic, mental forms that are represented straightforwardly by 
source texts. Just as programming language design should start with the design of 
the abstract syntax, and seek the most convenient way to represent that abstract 
syntax in text, the formal description of a language should first define the abstract 
structure of the language, then its concrete textual realization. Abstract syntaxes 
are defined by type assignments. 

Definition 13.1.1 

Let 2 be an alphabet on which an abstract term is to be built, and let Ig be a set 
of primitive types used to describe the uses of symbols in 2. 

A flat type over To is either a primitive type t €T, or 

5,)X% °° Xs, —t where s),°° + S,,t€T. 

The set of all flat types over Tg is called I. 


A type assignment to 2 is a binary relation TC DX. 
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For f €2, when 7 is understood, f:t,,°** ,f, means that frt; for i€f1,nJ. Usually 
7 is a function, so n=1. 

The typed language of & and 1, denoted Z,, is the set of all terms built from sym- 
bols in 2, respecting the types assigned by r. Formally, 

If (f,t) €7 for #€1p then f is in the typed language of 2 and 1, and f has type ¢ 
(i.e., f is a constant symbol of type t). 

If E,,°*:,£, are in the typed language of 2 and 7, with types s),°-~ ,s,€I'g, and 
"if fis,% +++ Xs, et, then f(E,,---,£,) is in the typed language of Z and 7 with 
type ¢. 

r is extended so that E7t, also written E:t, if ¢ is the type of an arbitrary term 


E€2,. 
0 


f(E,,°°+,£,) denotes an abstract term, or tree, with f at the head or root, and 
subterms E,,--:,£, attached to the root in order. In particular, f(E,, °° - ,E,) 
does not denote a particular string of characters including parentheses and com- 


mas. 


Abstract syntaxes are merely the typed languages of typed alphabets as 
described above. These languages might naturally be called regular tree 
languages, because they can be recognized by nondeterministic finite automata 
running from the root to the leaves of a tree, splitting at each interior node to cover 
all of the sons (a different next state may be specified for each son), and accepting 
when every leaf yields an accepting state [Th73]. All of the technical devices of 
this section are found in earlier theoretical literature on tree automata and gram- 
mars [Th73], but we believe that the particular combination here is worthy of con- 


sideration for practical purposes. 
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Type assignments are almost context-free grammars, they merely lack the ter- 
minal symbols. Each primitive type ¢ may be interpreted as a nonterminal symbol 
N, in the context-free grammar, and the type assignment f:s|X--+ Xs, —¢t 
corresponds to the production N, —*N,,:*:N,. Notice that the type assignment 
has one piece of information lacking in the context-free production - the name /. 
Names for context-free productions clarify the parse trees, and make issues such as 
ambiguity less muddy. The terminal symbols, giving the actual strings in the 
context-free language, are given separately from the type assignment. This separa- 
tion has the advantage of allowing different concrete notations to be associated 
with the same abstract syntax for different purposes (e.g., internal storage, display 
on various output devices - the translation from a low-level assembly code to binary 
machine code might even be represented as another notation for the assembly 
code). Separation also clarifies the distinction between essential structure and 
superficial typography. We do not suggest that typography is unimportant com- 


pared to the structure, merely that it is helpful to know the difference. 


The following definition introduces all of the formal concepts needed to define 
context-free languages in a way that separates essential issues from typographical 
ones, and that makes the relationship between strings and their abstract syntax 
trees more explicit and more flexible than in traditional grammatical notation. The 
idea is to define the abstract syntax trees first, by introducing the symbols that may 
appear at nodes of the trees, and assigning a type to each. Legitimate abstract 
syntaxes are merely the well-typed terms built from those symbols. Next, each 
symbol in the abstract syntax is associated with one or more notational schemata, 
showing how that symbol is denoted in the concrete syntax, as a string of charac- 


ters. Auxiliary symbols may be used to control the selection of a notational 
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schema when a single abstract symbol has several possible denotations. This con- 
trol facility does not allow new sets of strings to be defined, but it does allow a 
given set of strings to be associated with abstract syntax trees in more ways than 
can be done by traditional grammar-driven parsing. 


The design of this definition was determined by three criteria. 


1. The resulting notation must handle common syntactic examples from the com- 
piler literature in a natural way. 

2. The translations defined by the notation must be closed under homomorphic 
(structure-preserving) encodings of terms. For example, if it is possible to 
make terms built out of the binary function symbol f correspond to strings in 
a particular way, then the same correspondence must be possible when f(x,y) 
is systematically encoded as apply (apply (f,x),y), an encoding called Curry- 
ing. If this criterion were not satisfied, a user might have to redesign the 
abstract syntax in order to achieve typographical goals. The whole point of 


the new notation is to avoid such interference between levels. 


3. A natural subset of the new notation must equal the power of 


syntax —directed translation schemata [Ir61, LS68, AU72]. 


Definition 13.1.2 

Let cat be a binary symbol indicating concatenation. cat(a,8) is abbreviated af. 
Let V={x),x,°--} be a set of formal variables, VN Z=¢. Let A be a finite alpha- 
bet of auxiliary symbols, S€A a designated start symbol. A notational 
specification for a type assignment 7 to abstract alphabet 2 in concrete alphabet 
2 with auxiliary alphabet A is a binary relation 
nS (((ZUV),U {empty}) xd) x ({cat} U Q* U(VxA))y, where 7’ is r augmented so 


that each variable in V has every type in Tp (i-e., a variable may occur anywhere in 
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a term), and y is the type assignment such that each word in 0*, and each anno- 
tated variable in VXA, is a single nullary symbol of type STRING, and cat is of 
type STRINGXSTRING —STRING. empty denotes the empty term, without 
even a head symbol. Without loss of generality, the variables on the left-hand term 
in the 7 relation are always x),°°*,X,, in order from left to right. Notational 
specifications are described by productions of the form 

E4 is denoted F, 

indicating that <E,A>nF. Within F, the auxiliary symbols are given as super- 
scripts on the variables. When the same pair <E,A> is related by 7 to several 
expressions F,,°*- ,F,,, the m elements of the relation are described in the form 
E4 is denoted E, or «++ or E,,. Multiple superscripts indicate that any of the 
superscripts listed may be used. If only one element of A may appear on a particu- 
lar symbol, then the superscript may be omitted. 

A context—free notational specification is one in which the 7 relation is restricted 
so that when <E,A>nF, no variable occurs more than once in F. 

A simple notational specification is a context-free notational specification in which 
the » relation is further restricted so that when <E,A>nF, the variables 
X4,°°*,X, occur in order from left to right in E, possibly with omissions, but 
without repetitions. 

Each notational specification » defines an interpretation nG (Z,xA) x Q*, associat- 
ing trees (abstract syntax) with strings of symbols (concrete syntax). The interpre- 
tation is defined by 

<E,A>na => <E,A>7a when E contains no variables. 

<E[x1, °° + X_1,4 >nal<x),B)>,°°*,<x,,B,>1& 


<E,B,>78, & °** & <EmsBm>7Bm & 
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<empty Brnai>mBm+ & °° * & <empty,B,>7B, 

=> f(E),°** ,E,,)nalB,, °° By] 
Fra abbreviates <F,S >na, where S is the start symbol. _ 
The string language of a notational specification » at type ¢ is 
{ee 2*| FED, (Ft) €7 & Fra. 
0 
The formal definition above is intuitively much simpler than it looks. Notational 
specifications give nondeterministic translations between terms and strings in a 
schematic style using formal variables. The intent of a single production is that 
the terms represented by the variables in any instance of that production should be 
translated first, then those translations should be combined as shown in the produc- 
tion. In most cases, the left-hand side of a production is of the simple form 
f(x,,°°+,x,). This form is enough to satisfy design criterion 1 above. The more 
complex terms allowed on left-hand sides of productions are required to satisfy cri- 
terion 2, and the empty left-hand sides are required for criterion 3. Greater 
experience with the notation is required in order to judge whether the added gen- 


erality is worth the trouble in understanding its definition. 


The string languages of context-free notational specifications and simple nota- 
tional specifications are precisely the context-free languages, but notational 
specifications offer more flexibility than context-free grammars for defining tree- 
string relations. Since the relation between the parse tree and a string has a prac- 
tical importance far beyond the set of parsable strings, this added flexibility 
appears to be worth the increase in complexity of the specifications, even if it is 
never used to define a non-context-free language. The type-notation pairs defined 


here are more powerful than the syntax—directed translation schemata of Aho 
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and Ullmann [AU72] when the abstract-syntax trees are encoded as strings in pos- 
torder. Context-free notational specifications are equivalent in power to syntax- 
directed translation schemata, and simple notational specifications are equivalent in 
power to simple syntax—directed translation schemata and to pushdown trans- 
ducers. Independently of the theoretical power involved, the notation of this sec- 
tion should be preferred to that of syntax-directed translation schemata, since it 
makes the intuitive tree structure of the abstract syntax explicit, rather than encod- 
ing it into a string. Notice that auxiliary symbols are equivalent to a restricted use 
of inherited attributes in attribute grammars. Since these attributes are chosen 
from a finite alphabet, they do not increase the power of the formalism as a 


language definer. 


A notational specification is generally not acceptable unless it is complete, 
that is, every abstract term in the typed language has at least one denotation. 
Completeness is easy to detect automatically. Unfortunately, ambiguity, that is, 
the existence of two or more denotations for the same abstract term, is equivalent 
to ambiguity in context-free grammars, which is undecidable. For generating 
parsers, ambiguity is deadly, since the result of parsing is not well-defined. Just as 
with context-free grammars, we should probably enforce stronger decidable 
sufficient conditions for nonambiguity. For unparsers, ambiguity is usually undesir- 
able, but it might be occasionally useful when there is no practical need to distin- 


guish between two abstract terms. 


Every context-free grammar decomposes naturally into a type assignment and 
a simple context-free notational specification, with a trivial A, and 7 a function. 
The natural decomposition is unique except for ordering of the arguments to each 


function symbol, and choice of names for the function symbols. Similarly, every 
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context-free type-notation pair defines a unique natural context-free grammar. 


Example 13.1.1 


Consider a context-free grammar for arithmetic expressions. 


S—N | 
S—-S+S | 
S S+S 
S—(S) 
N —0|1[2/3]415]6|7/8|9 


N—-NO|N1|--- |N9 
This grammar decomposes naturally into the type assignment 


numeral:N —-S 


plus:SxS —-S 
times:S XS —S I 
paren:S —S H 
digitO:N --- digit9:N 

extend0:N —-N --+ extend9:N —N 


and the context-free notational specification 


numeral (x) is denoted x, 

plus (x ,x2) is denoted x,+x» 

times(x ,,x2) is denoted x\*x 

paren(x,) is denoted (x,) 

digitO is denoted 0 --- digit9 is denoted 9 


extendO(x,) is denoted x,0 --- extend9(x,) is denoted x,9 


108 13. Use for Syntactic Manipulations 


The type-notation pair derived above is not the most intuitive one for arithmetic 
expressions, because the paren symbol has no semantic content, but is an artifact 
of the form of the grammar. To obtain a more intuitive notational specification, 
for the type assignment that omits paren, delete the paren line from the notational 


specification above, and replace the plus and times lines by the following. 


plus (x,,x) is denoted x,+x2 or (x,+x») 


times (x,x2) is denoted x \*Xx» or (x1*x) 


The auxiliary alphabet A is still trivial, but the notational relation 7 is no longer a 


function. 


0 


The auxiliary symbols in A are not strictly necessary for defining an arbitrary 
context-free language, but they allow technical aspects of the parsing mechanism to 
be isolated in the notational specification, instead of complicating the type assign- 
ment. For example, the extra nonterminal symbols often introduced into grammars 
to enforce precedence conditions should not be treated as separate types, but 


merely as parsing information attached to a single type. 


Example 13.1.2 

The grammar of Example 13.1.1 is ambiguous. For example, 1+2*3 may be 
parsed as plus (1,times (2,3)), or times (plus (1,2),3). The usual way to avoid the 
ambiguity in the context-free grammar is to expand the set of nonterminal symbols 
as follows. 

S-S+T 

S—-T 

T —T+#R 
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TR 
R—-(S) 

R-N 

N —0|1]2/3]4|5|6]7/8]9 | 
N—NO|N1|++° |N9 


This grammar gives precedence to * over +, and associates sequences of the same 


operation to the left. The direct translation of this grammar yields the following 
type assignment and notational specification. 

plus:SxT —-S 

summand:T +S 

times:T XR —-T 

multiplicand:R —-T 

paren:R —S 

numeral:N —R 

digit0:N +++ digit9:N 

extend0:N —-N +++ extend9:N —-N 


plus (x ,x2) is denoted x,+x» 
summand (x) is denoted x, 

times (x 1,2) is denoted x *x 
multiplicand (x,) is denoted x, 
paren(x,) is denoted (x) 
numeral (x,) is denoted x, 


digitO is denoted 0 --+ digit9 is denoted 9 
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extend0(x,) is denoted x,0 +++ extend9(x,) is denoted x,9 


As well as paren, we now have the semantically superfluous symbols summand and 
multiplicand, and types T and R that are semantically equivalent to S. In order 
to keep the abstract syntax of the second part of Example 13.1.1, while avoiding 
ambiguity, the parsing information given by these semantically superfluous symbols 
and types should be encoded into auxiliary symbols $,7,R as follows. 

plus:SxS —-S 

times:S XS —-S 

numeral:N —-S 

digitO:N --+ digit9:N 

extend0:N —-N +++ extend9:N —-N 


plus (x,,x2)5 is denoted xf+xJ or (x$+x7) 
plus (x,x2)7® is denoted (x$+x7) 

times (x1,x2)5"" is. denoted x]ex® or (xT*x®) 
times (x,,x2)* is denoted xT «xP 

numeral (x,)5\™ is denoted x, 

digitO is denoted 0 +++ digit9 is denoted 9 


extend0(x)) is denoted x0 «++ extend9(x,) is denoted x,9 


0 


Non-context-free notational specifications allow matching of labels at the 
beginning and end of bracketed sections. For example, to define procedure 


definitions in a conventional programming language so that the name of a pro- 
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cedure appears at the end of its body, as well as in the heading, use the following 


notational specification: 


procedure (x ,X,x3) is denoted PROCEDURE x(x );x3 END{x,) 


Given current software that is available off-the-shelf, the best thing to do with 
a context-free type-notation pair is probably to convert it into a context-free gram- 
mar, and apply the usual parsing techniques. The new notation does not show to 
greatest advantage under such usage, since the tricks that are required to avoid 
parsing conflicts may complicate the grammar by introducing otherwise unneces- 
sary auxiliary symbols. It is probably not a good idea to try to parse with non- 


context-free type-notation pairs, although unparsing is no problem. 


The ideal application of type-notation pairs depends on future availability of 
structure—editor generators, as an alternative to parsers [MvD82, Re84]. Struc- 
ture editors are fruits of the observation that source text was developed for particu- 
lar input techniques, such as the use of punched cards, involving substantial off-line 
preparation before each submission. During the off-line work, a user needs to work 
with a presentation of his program that is simultaneously machine-readable and 
human-readable. With highly interactive input techniques, based on video termi- 
nals, there is no more requirement that what the user types correspond character 
by character to what he sees on the screen. Structure editors let short sequences of 
keystrokes produce simple manipulations of tree structures (i.e., abstract syntaxes), 
and instantly display the current structure on the screen. The process that must 
be automated is unparsing, a process technically much simpler than parsing, and 
immune to the problems of nondeterminism that arise in general context-free pars- 
ing. Type assignments are ideal for defining the internal structures to be manipu- 


lated by structure editors. The notational components still need strengthening to 


EE I 
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deal with the two-dimensional nature of the video display. Unparsing is easy even 


for non-context-free type-notation pairs. 


13.2. Terms Representing the Syntax of Terms 


When equations are used to perform syntactic processing, we often need a way to 
distinguish different levels of meaning. For example, there is no way to write an 
equational program translating an arbitrary list of the form (F A, -:: A,) to 


F[A,,:°*+,A,]. We might be tempted to write 
translatel(x . y)] = x[transargsly]], 


but F is a nullary symbol in the first instance, and an n-ary function symbol in the 
second. Also, the use of the variable x in place of a function symbol on the right- 
hand side is not allowed. Yet, the translation described above is a very natural 
part of a translation of LISP programs into equational programs. The problem is, 
that the symbol translate is defined above as if it operates on the objects denoted 
by S-expressions, when, in fact, it is supposed to operate on the expressions them- 
selves. Consider the further trouble we would have with a syntactic processor that 


counts the number of leaves in an arithmetic expression. We might write 


count_leaves (add (x, y)) = add (count_leaves (x), count_leaves (y)). 


Technically, this equation overlaps with the defining equations for add. Intuitively, 


it is a monster. 


To achieve sufficient transformational power for the LISP example, and to 
avoid the confusion of the add example, we need a notation for explicitly describ- 
ing other notations -- terms denoting terms. As long as we accept some equations 


on terms, we are in trouble letting terms stand for themselves. Rather, we need 
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explicit functions whose use is to construct terms. One natural set of functions for 
this purpose uses list notation based on nil and cons, as in LISP, plus 
atomic_symbols as names of symbols within the terms being described, unary func- 
tions litsym, atomsym, intnum, truthsym, and char to construct symbols of vari- 


ous types from their names, and multiap to apply a function symbol syntactically 
to a list of argument terms. 


Example 13.2.1 
Using the notation defined above, we may translate LISP S-expressions to the func- 


tional terms that they often represent. We —i must translate 
multiap[litsymI[cons]; (atomsym[F]. --+)] to multiap[litsym[F]; (--+)]. This 


may be accomplished naturally by the following equational program. 


Symbols 


: Constructors for terms 
nil: 0; 
cons: 2; 
litsym, atomsym, intnum, truthsym, char: 1; 
multiap: 2; 
include atomic_symbols; 


: Translating operators 
translate: 1; 
transargs: 1. 


For all x, y: 
translatellitsym[x]] _= litsymlx]: 
translatelatomsymlx]]_ = atomsymlx]; 
translatelintnum[x]] _ = intnum[x]; 


translateltruthsym[x]] = truthsymIx]; 
translatelchar[x]]_ — = charlx]; 


translatelmultiapllitsymlcons]; (atomsymIx] . y)]] = 
multiapllitsymlx]; transargslyll; 

transargs[O] = QO; 

transargsl(x . y)] = (translatelx] . transargsly)). 
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Similarly, the leaf counting program becomes 
Symbols 
: Constructors for terms 

nil: 0; 

cons: 2; 

litsym, intnum: 1; 

multiap: 2; 

‘include atomic_symbols; 


: Counting operator 
count: I; 


: Arithmetic operator 
ada: 2. 


For all x, y: 
countlintnum[x]] = 1; 


count[multiapllitsymladd]; (x y)]] = addlcount[x]; countlyl]. 


0 


The examples above are not appealing to the eye, but the potential for confusion is 
so great that precision seems to be worth more than beauty in this case. Special 
denotations involving quote marks might be introduced, but they should be taken as 
abbreviations for an explicit form for denoting syntax, such as the one described 


above. 


In order to apply an equational program to perform a syntactic manipulation, 
the term to be manipulated should be in an explicitly syntactic form. Yet, the ini- 
tial production of that term as input, and the final output form for it, are unlikely 
to be explicitly syntactic themselves. For example, one use of the translation of S- 
expressions to functional forms is to take an S-expression, translate it into a func- 
tional form, evaluate the functional form, then translate back to an S-expression. 


The user who presents the initial S-expression only wants to have it evaluated, so 
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he should not need to be aware of the syntactic translations, and should be allowed 
to present the S-expression in its usual form. In order to evaluate the functional 
form, it must not be given as explicit syntax, else it will not be understood by an 
evaluator of functional forms. From these considerations, we seem to need 
transformers in and out of explicit syntactic forms. These translators cannot be 
defined by the equation interpreter, without getting into an infinite regress, since 
the transformation of a term into its explicit syntactic form is itself a syntactic 
transformation. So, the implementation of the equation interpreter includes two 
programs called syntax and content. syntax transforms a term into its explicit 


syntactic form, and content transforms an explicit syntactic form into the term that 


it denotes. Thus, the syntax of (A . B) is 
multiap[litsymI[cons]; (atomsym[A] atomsym[B))], 


and the content of 


multiap[litsym[f 1; (atomsym[A] intnum[22] atomsym[B))] 
is f[A; 22; B]. 


13.3. Example: Type-Checking in a Term Language 

A typical problem in non-context-free syntactic processing is type checking, where 
the types of certain symbols are given by declarations lexically distant from the 
occurrences of the symbols themselves. While the most popular application of type 
checking occurs in processing conventional programming languages, such as Pascal, 
the essential ideas can be understood more easily by considering a simpler 
language, consisting of declarations followed by a single term. An advanced 
compiler class taught by Christoph Hoffmann constructed a full type checker for 


Pascal in 1983. Substantial extra effort was required to enforce declare-before-use 


Ee 
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and similar restrictions included in the design of Pascal mostly to simplify conven- 


tional compiler implementations. 


We present the term language in the type-notation form of Section 13.1, and 
assume that some mechanism is already available to translate from concrete syntax 
to abstract syntax. The type S denotes a list of declarations paired with a term, E 
denies a term, D a declaration, A an atomic symbol and T a type. EL, DL, and 
TL denote lists of declarations, terms, and types, respectively. S, E, and T are 
also used as auxiliary symbols, corresponding intuitively to their uses as types. The 
auxiliary symbol P indicates a type occurring within the left-hand side of a func- 
tional type, and so needing to be parenthesized if it is a functional type itself, EC 


and DC indicate nonempty lists of terms and declarations, respectively. 


typed_term:DLXE —S 

cons:DXDL —DL, EXEL ~EL, TXTL —TL 
nil:DL,EL,TL 

declaration:A XT —D 

type:A —T 

function:T XTL -T 

term:A -E 


multiap:E XEL -E 


typed_term(dl,e) is denoted dl? .e 


’ 
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cons (d,dl)°© is denoted d; dl or d dl% 
declaration (a,t) is denoted a:t™ 
type(a)™? is denoted a 

function (t,tl)7 is denoted tI™T® —1 
function (ttl)? is denoted (tIT —1) 

cons (t,tl)™ is denoted t?xtIT© or t t1N 
term(a)®-P is denoted a 

multiap (e,el)® is denoted e? (el=) 
multiap (e,el)? is denoted (e?) (el®°) 
cons (e,el)®© is denoted e,el=© or e elN 


nil® is denotede 


The operator multiap above applies a function, which may be given by an arbi- 
trarily complex term, to a list of arguments. term constructs a primitive term, and 
type a primitive type, from an atomic symbol that is intended to be the name of 
the term or type. function(t,tl) represents the type of functions whose arguments 
are of the types listed in t/, and whose result is of type t. The denotational 
specifications above produce the minimum parenthesization needed to avoid ambi- 
guity. The empty list is denoted by the empty string. A typical element of the 


language above is given by the concrete form 
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fuxt ot; 
giltxt 1) xt >t 1; 
ait. 


Sf (a,(gYf,a)) (a)) 


and the abstract form, given in LISP.M notation, 
typed_term[ 
: declaration|f ;function[type(t],(typelt] typelt))]] 
Heilafation [ 


Fanction| 
function(typelt],(type(t))]; 
, Wametont pelt Gipelil” type[t])],type(t]) 


i: 


declaration|a,typel(t]] 


multiap[ 
Re if); 


termla] 
multiap{ 


multiaplterml[g],(term[f] termla))]; 
termla 


Although the presentation of the abstract syntax above is not very readable for 
even moderate sized declarations and terms, it has the flexibility needed to describe 
a wide class of computations on the terms. The substantial ad hoc extensions to 
the concrete syntax that would be required to denote portions of well-formed 


expressions, and to present equations with variables, would end up being more 
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confusing than the general abstract form. In particular, the distinction between 
semantic function application, involving the operators performing the type check- 
ing, and syntactic function application in the term being checked, requires some 
departure from conventional concrete notation. So, even though it is a very poor 


display notation, the abstract syntax may be a good internal notation for defining 


computations. 

The following equational program uses the LISP.M notation to define type 
checking for the typed terms described above. We assume that equations are given 
defining operations on symbol tables, as described in Section 16.3. 


: checkltyped_termldl; e]] evaluates to true if the term e is type correct 
: with respect to the declarations in dl. 


Symbols: 


: constructors for typed terms 
: declarationla;t] declares the atomic symbol a to have type t 
: typelal is a primitive type named by the atomic symbol a 
: Pinetioniall is the type of functions with argument types given 
by the list tl, and value of type t 
: termlal] is a primitive term named by the atomic symbol a 
: multiaplezel] is a term with head function symbol e applied to the list of 
arguments el 
typed_term: 2; 
declaration: 2; 
type: 1; 
function: 2; 
term: 1; 
multiap: 2; 


: standard list constructors 
cons: 2; 
nil: 0; 


: type manipulating operations 
typeof: 2; 
typelist: 2; 
resulttype: 2; 
argtypes: 2; 


: primitive symbol table operations 
entertable: 3; 


— 
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emptytable: 0; 
lookup: 2; 


: special type-checking operations 
check: 1; 
buildtable: 1; 
looklist: 2; 
checkargs: 2; 
typecheck: 2; 


: standard logical symbols 
equ: 2; 
and: 2; 
include truth_values; 


: atomic symbols used to identify type, constant, and function symbols 
include atomic_symbols. 


For all d, dl, e, el, t, tl, 2, tl, tll, tl2, a, al, a2, st, b: 


: To check a typed term, build a symbol table from the declarations, 
: then check the term against the symbol table. 


checkltyped_termldl,e]] = typechecklbuildtableldll; el; 
: The symbol table is built by the natural iteration through the list of 
: declarations. 

buildtable[0] = emptytablel]; 

buildtable[(declaration[a;t] . di)] = entertablelbuildtableldl]; a; tJ; 
: Final type checking goes recursively through the term. Whenever a 
: function application is encountered, the type of the function is computed and 
: checked for consistency with the types of the arguments. 

typechecklst; 0] = true; 

typechecklst; (e . el)] = andltypechecklst; e]; typechecklst; elll; 

typechecklst; termla]] = true; 

typechecklst; multiaple; el]] = 

andlandltypechecklst; e]; 
typecheckIst; ell]: 
checkargslargtypesltypeoflst; el]; typelistlst; elll]; 


typelistlst; 0] = 0; 
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typelistlst; (e . eD] = (typeoflst; e] . typelist[st; ell); 

typeoflst; termlal] = lookuplst; al; 

typeoflst; multiaple; ell] = resulttypeltypeoflst; ell: 
resulttypelfunctionlt, tl]] = t; 

argtypeslfunctionlt; tll] = tl; 

checkargs[0; O] = true; 

checkargs[0; (t . 1D] = false; 

checkargsl(t . 1D; O] = false; 

checkargs[(t1 . tl1); (t2 . 1120] = andleqult1; t2]; checkargs[tl1; t12]]: 


: Assume that equations are given for entertable, lookup. 
: The standard equality test on atomic symbols is extended to type 
expressions by the natural recursion. 


equltypelail; typela2]] = equlal; a2]: 


equlfunction{t1; tl1]; function[t2; t12]] = 
andleqult!; t2]; checkargs[tl1; tl2]]; 


equlfunctionlt; tl]; typelal] = false; 
equltypelal; functionlt,; tl]] = false; 


include equatom; 


: and is the standard boolean function. 
andltrue; true] = true; 


andltrue; b] = b. 


The equations above may easily be augmented to report conflicts and omis- 
sions in the declarations. If type checking is to be followed by some sort of seman- 


tic processing, such as interpreting or translating the term, it may be useful to 
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attach types (or other information from a symbol table) to the symbols in a term, 
so that the semantic processing does not need the symbol table. This technique 
was used in the implementation of the equation interpreter itself: each variable 
symbol in a left-hand side term is annotated to indicate the allowable substitutions 
for that variable, and each variable in a right-hand side term is annotated to show 
its address on the corresponding left-hand side. These annotations provide pre- 
cisely the information needed by the semantic portion of the interpreter. To illus- 
trate the technique, we present equations annotating each symbol in a typed term 
with its type. 


: typenotes|st; e] annotates each symbol in the term e with the type assigned to it 
: by the symbol table st. Type conflicts are marked for debugging purposes. 


Symbols: 


: constructors for types and terms used as in the previous program 
type. I; 
Sunction: 2; 
term: 1; 
multiap: 2; 


‘extra constructors for annotated terms 
: aterm[a,t] is a primitive term named by the atomic symbol a of type t 
: typeconflictle] marks the term e as having a typeconflict in the application 
of its head function symbol to inappropriate arguments 
aterm: 2; 
typeconflict: 1; 


: standard list constructors 
cons: 2; 
nil: 0; 


: type manipulating operations 
typeof: 2; 
typelist: 2; 
resulttype: 2; 
argtypes: 2; 


: primitive symbol table operations 
lookup: 2; 


: special type-checking operations 
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looklist: 2; 
checkargs: 2; 
typenotes. 2; 


: standard logical symbols 
equ: 2; 
and: 2; 
if: 3; 
include truth_values; 


: atomic symbols are used to identify type, constant, and function symbols 
include atomic_symbols. 


For all d, dl, e, el, t, t1, t2, tl, tll, tl2, a, al, a2, st, b: 
: Annotation goes recursively through the term. Whenever a function 
: application is encountered, the type of the function is computed and checked 
: for consistency with the types of the arguments. 
typenoteslst; O] = O; 
typenoteslst; (e . el] = (typenoteslst; e] . typenotes[st; ell); 
typenotes|[st; termla]] = atermla; typeoflst; all; 
typenotes[st; multiaple; ell] = 
iflcheckargslargtypesltypeoflst; ell: fypelistlst ell]; 
multiapltypenotesIst; e]; typenotes!st; ell]; 
typeconflict[multiapltypenotesl[st; el; typenotes|st; ellll]: 


: Assume that equations are given for entertable, lookup. 
: Equations for the remaining operations are the same as in the previous program. 


if (true, x, y) = x; if(false, x, y) = y. 


ee 


14. Modular Construction of Equational Definitions 


The equational programming language, although its concepts are far from being 
primitive operations on conventional computing machines, is not really a high-level 
language. Rather, it is the assembly language for an unusual abstract machine. 
The problem is that the set of equations constituting a program has no structure to 
help in organizing and understanding it. In order to be suitable for solving any but 
rather small problems, the equational programming language needs constructs that 
allow decomposition of a solution into manageable components, or modules, with 
semantically elegant combining operations. In particular, different sets of equa- 
tions, presented separately, must be combinable other than by textual concatena- 
tion, and the combining operation must protect the programmer from accidental 
coincidence of symbols in different components. Furthermore, sets of equations 
must be parameterized to allow variations on a single concept (e.g., recursion over 
a tree structure) to be produced from a single definition, rather than being gen- 
erated individually. In this chapter, we define a speculative set of combining 
operations for equational programs, and discuss possible means of implementation. 
None of these features has been implemented for the equational programming 
language, although some similarly motivated features are implemented in OBJ 
[BG77, Go84]. Libraries of predefined equations cannot be integrated well into the 


equation interpreter until some good structuring constructs are implemented. 


In order to combine sets of equations in coherent ways, we need to design the 
combining operations in terms of the meanings of equational programs, rather than 
their texts. Based on the scenario of Section 1, the meaning of an equational pro- 


gram should be characterized by three things: 
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1. the language of terms over which equations are given; 


2, aclass of models of that language; 
3. the subset of terms that are considered simple, or transparent, enough to be 


allowed as output. 
Items 1 and 2 are standard concepts from universal algebra. For equational pro- 
gramming, the information in 2 may be given equivalently as a congruence relation 
on terms, but extensions to other logical languages might need the greater general- 
ity of classes of models. In any case, the use of classes of models is better in keep- 
ing with the intuitive spirit of our computing scenario. 
Definition 14.1 
Let = be a ranked alphabet, p(a) the rank of a€Z, Dy the set of terms over 2. 
A model of 2 is a pair <U,y>, where U, the universe, is any set, and y is a map- 
ping from © to functions over U with y(a):U%™ —U. 
y extends naturally to Zy by Y(/(E,,---,£,)) = W)) (WE), ++ > W(E,)). 
A model <U,y> satisfies an equation E=F, written <U,y> |- E=F, if 
W(E)=)(F). 
A set W of models satisfies a set E of equations, written W|=E, if 
<U,y> |= E=F for all <U,y>€W, E=F EE. 
The set of models of a set E of equations, written Mod(E), is 
{<U,y>| <U,y> |+ E}. 
The set of normal forms of a set E of equations, written Norm(E), is 
{E€X,| VF=GEE F is not a subterm of E). 
A computational world is a triple <2,W,N>, where & is a ranked alphabet, W is 
a class of models of Z, and NCZy. N is called the output set. 


The world of a set of equations E over %, written World(E), is 
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<2, Mod(E), Norm(E)>. 
a 


The function World assigns meanings to equational programs by letting 2 be the 
ranked alphabet defined by the Symbols section, and taking World(E) where E is 
the set of all instances of equations in the program. The computing scenario of 
Section 1 may be formalized by requiring, on input E, an output F € Norm (E) 
such that Mod(E) |= E=F. In principle, we would like separate control of the 
models and the output terms, but the pragmatic restriction that output terms must 
be precisely the normal forms -- terms with no instances of left-hand sides of 
equations -- limits us to constructs that generate both of these components of pro- 
gram meanings in ways that make them compatible with the evaluation mechanism 
of reduction. Notice that uniqueness of normal forms means that there may not be 
two different terms F,,F,€Zy with W |= F\=F,, i.e, there must exist at least one 


model in W in which ¥(F\)¥(F,). 


Explicit presentation of an equational program is one way to specify a compu- 
tational world, the purpose of this section is to explore others. Because this aspect 
of our study of equational logic programming is quite new and primitive, we do not 
try to build a convenient notation for users yet. Rather, we explore constructs that 
are both meaningful and implementable, and leave the design of good syntax for 
them to the future. The OBJ project has gone much farther in the user-level 
design of constructs for structured definition of equational programs, and the con- 
structs proposed here are inspired by that work [BG77, Go84]. Our problem is to 
determine how well such constructs can be implemented in a form consistent with 
our computational techniques. In particular, we prefer implementations in which 


all of the work associated with the structuring constructs is done during the prepro- 
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cessing of equations, so that the resulting interpreter is of the same sort that we 
derive directly from a set of equations. 

The most obvious way of constructing a computational world is to combine the 
information in two others. 
Definition 14.2 
Given two ranked alphabets 2,C2,, and a model <U,y> of 2), we may restrict 
this model naturally to a model of 2, by restricting y to Z,. When two classes of 
models, W, and W2, are given over different alphabets, 2, and 2., the intersection 
W,NW, is the set of all models of Z;UZ, whose restrictions to 2, and Z» are in 
W, and W2, respectively. In the case where 2;=Zp, this specializes naturally to the 
conventional set-theoretic intersection. 
Given two computational worlds, <2, ,W, ,N,;> and <Z,, W2, N2>, the sum 
<2), Wi, Ny >+<2,, W2, N2> is <2, UZ, W)NW2, Ni; UN2>. 
0 
The sum described above is similar to the enrich operation of [BG77], except that 
enrich requires one of its arguments to be given by explicit equations. The sum of 
computational worlds corresponds roughly to concatenation of two sets of equa- 
tions. Even this simple combining form cannot be implemented by simple-minded 
textual concatenation, because a variable in one program may be a defined symbol 
in the other. If an equational program is syntactically cooked into a form where 
each instance of a variable is marked unambiguously as such, then concatenation of 
cooked forms will often accomplish the sum of worlds. The equation interpreter is 
implemented in such a way that the cooked form is created explicitly, and may pro- 
vide the basis for a very simple implementation of the sum. For greater efficiency, 


we would like to implement the sum at the level of the pattern-matching tables 
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that drive the interpreter. Such an implementation depends critically on the choice 


of pattern matching technique (see Section 18.2). 


The concatenation of syntactically cooked programs described above succeeds 
only when the combined equations satisfy the restrictions of Section 5. Certainly, 
this need not be the case, since the sum of computational worlds with unique nor- 
sadt forms may not have unique normal forms. For example, combining the single- 
ton sets {a=b) and (a=c} fails to preserve uniqueness of normal forms. This prob- 
lem is inherent in the semantics of the sum, and can only be solved completely by 
an implementation that deals with indeterminate programs. Unfortunately, there 
are cases where the semantics of the sum does maintain uniqueness of normal 


forms, but the concatenation of equations fails nonetheless. 


Example 14.1 

Consider the following two sets of equations: 

1 (f(g(x))=a , a=b} 

2. {g(c)=c , b=a} 

Each of these sets individually has unique normal forms, and every reduction 
sequence terminates. Their sum has uniqueness of normal forms, but not finite ter- 
mination. Notice that, in the sum, the unique normal form of f (g(c)) is f (c), but 


there is also an infinite reduction f(g(c))=a=b=a---. The combined set of 


equations cannot be processed by the equation interpreter, because of the overlap 
between f (g(x)) and g(c). 
0 


This failure to achieve an implementation of the sum in all semantically natural 


cases is a weakness of the current state of the equation interpreter. Of course, one 
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may fiddle with the semantics of the sum to get any implementation to work, but 
we would rather seek implementation techniques that approach closer to the 
semantic ideal. 

The sum allows for combining equational programs, but by itself is not very 
useful. We need a way of hiding certain symbols in a program, so that sums do 
not produce surprising results due to the accidental use of the same symbol in both 
summands. Other changes in notation will be required so that a module is not res- 
tricted to interact only with other modules using the same notational conventions. 
Both of these needs are satisfied by a single semantic construct. 

Definition 14.3 

Given a model <U,,y,> over the alphabet 2), a model <U,,y> over the alpha- 
bet 2», and a relation 6C2,4%2Z24z, <U),y)>5<U,y.> if 

VE\,F\€2\4 £2,F2€ 224 ESE, & Fi6F, & p(E)) =, (F)) > y,(E,)=y,(F,) 
For a set W of models over Z,, 6[W] is the set of all models over 2, that are in the 
6 relation to some member of W. 

For a subset NG, y, [NV] is the set of all terms in 24 that are not in the 6 rela- 
tion to any term not in NV. 

Given a computational world <2Z,,W,N>, an alphabet 2, and a relation 


5G 4XZ 4, the syntactic transform of <2,,W,N>, 5, 2] is <2, 6[W], SIN I>. 


0 


The syntactic transform defined above may accomplish hiding of symbols, by let- 
ting ZC), with 6 the equality relation restricted to Z,. A change of notation, for 
example letting f (x,y,z) represent g(x,h(y,z)), is accomplished by a 6 relating 
each term E to the result E' of replacing every subterm of the form 


g(F,,h(F2,F3)) with the form f(F,,F2,F;). The syntactic transform is similar to 
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the derive operator of [BG77]. 


The syntactic transform for symbol hiding may be implemented by renaming 
the symbols in 2, — Z in such a way that they can never coincide with names 
produced elsewhere. Other syntactic transforms require different implementations 
depending on the characteristics of the 6 relation. Suppose 6 is a partial function, 
defined by a set of equations suitable for the equation interpreter. Add to the 


equations defining 6 the equation 


6(v) = v where v is a variable symbol. 


The resulting equational program may be used to transform equations for 
<2,,W,N> into equations defining ol<2,,W,N>, 6, 22]. If the transformation 
eliminates all instances of symbols not in 24, and produces equations satisfying the 
restrictions of Section 5, then we have an implementation of the syntactically 
transformed world. We have not yet determined how easy or hard it is to produce 


definitions of useful 5s allowing for this sort of transformation. 


If 5 is a one—to—one correspondence between some subset of 2,4 and a sub- 
set of Z¥, and if § and &' can be programmed nicely, then the syntactic 
transform may be implemented by applying 5! to the input, applying an existing 
program for <Z,,W,N> to that result, and finally applying 6. If 6 is defined by 
an equational program, and if the result of reversing each equation satisfies the res- 
trictions of Section 5, then we may use those reversals as a program for &!. Alter- 
natively, given equational programs for 5 and 5! we may try to verify that they 
are inverses by applying the program for 6! to the right-hand sides of equations in 
the program for 5, to see if we can derive the corresponding left-hand sides. 


Further study is required to determine whether either of these techniques works 
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often enough to be useful. In any case, the last two techniques produce something 
more complex than a single equational program, raising the question of how to con- 


tinue combining their results by more applications of sum and syntactic transform. 


15. High-Level Programming Techniques 


This section treats high-level programming concepts that fit naturally on the 
evaluation mechanism of the equation interpreter. The current version of the inter- 
preter does not provide any special syntax to support these techniques, but future 


work may lead to convenient syntaxes for applying them. 


15.1. Concurrency 


Although the current implementation of the equation interpreter runs on conven- 
tional sequential computing machines, some qualities of equational logic show 
interesting potential for implementation on parallel machines of the future. Even 
with a sequential implementation, the ability to express programming concepts 
based on concurrent computations gives a conceptual advantage. There are three 
distinguishable sources of concurrent programming power in a reduction-based 
evaluation for equational logic, two of which are provided by the current implemen- 


tation. 


The simplest source of concurrent programming power arises from the possi- 
bility of evaluating independent subexpressions concurrently. Roughly speaking, 
this means that several, or all, of the arguments to a function may be evaluated 
simultaneously (in general, evaluation of a single argument may be partial instead 
of complete). This potential concurrency arises simply from the absence of side- 
effects in equational evaluation, and is the same as the potential concurrency in 
Functional Programming languages [Ba78]. A genuinely parallel implementation 
of the equational interpreter might realize a substantial advantage from such con- 
currency. There is also a definite conceptual advantage to the programmer in 


knowing that order of evaluation of arguments is irrelevant to the final result. In 
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most cases, however, no style of programming is supported that could not translate 


quite simply into sequential evaluation in some particular order. 


Concurrent evaluation of independent subexpressions becomes an essential 
feature when it is possible to reach a normal form without completing all of the 
evaluations of subexpressions, and when there is no way a priori to determine how 
much evaluation of which subexpressions is required. The most natural example of 
this behavior is in the parallel or, defined by the equations: 

or(true, x) = true; 
or(x, true) = true; 


or(false, false) = false. 


Intuitively, it seems essential to evaluate the arguments of the or concurrently, 
since one of them might evaluate to true while the other evaluation fails to ter- 
minate. In the presence of other equations defining other computable functions, 
there is no way to predict which of the arguments will evaluate to a truth value. 
Translation of the intuitively concurrent evaluation of an expression containing ors 
into a sequential computation requires a rather elaborate mechanism for explicitly 
interleaving evaluation steps on the two arguments of an or. The current imple- 
mentation of the equation interpreter does not support the essential concurrency 
involved in the parallel or, due to the restriction number 5 of Section 5 (left- 
sequentiality). Section 19 presents evidence that the parallel or cannot be satisfac- 
torily simulated by the sequentializable definitions allowed by the current imple- 
mentation. Future versions will support the parallel or, and similar definitions. 
The basis of the implementation is a multitasking simulation of concurrency on a 
sequential machine. We do not know yet how small the overhead of the multitask- 


ing can be, and that stage of the project awaits a careful study of the concrete 
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details involved. The general problems of supporting this sort of concurrency on 


conventional sequential machines are discussed in Section 18.3. 


The final source of concurrent behavior is easily overlooked, but is arguably 
the most useful of them all. This sort of concurrency results from the outermost, 
or "lazy", evaluation strategy, that allows nested subexpressions to be reduced con- 
currently. Roughly, this means that evaluation of a function value may go on con- 
currently with evaluation of the arguments. Every piece of partial information 
about the arguments may immediately result in partial, or even complete, evalua- 
tion of a function value. As with the parallel or, this sort of concurrency translates 
into sequential behavior only at the cost of a careful interleaving of steps from the 
intuitively concurrent components of a computation. Unlike the unconstrained 
interleaving of the parallel or, the interleaving of nested evaluation steps is highly 
constrained by the need to generate enough information about arguments to allow 
an evaluation step on a function of those arguments. Nested concurrency is sup- 
ported by the current version of the interpreter, as it depends only on the outermost 
evaluation strategy. It allows a style of programming based on pipelined corou- 


tines, or more generally on dataflow graphs, as shown in Section 15.3. 


15.2. Nondeterminism ys. Indeterminacy 


Dijkstra argues eloquently [Di76] that programmers should not be required to 
specify details that are inessential to solving a given problem. Not only is inessen- 
tial detail wasteful of programming time, it may have an adverse effect on the clar- 
ity of the program, by requiring a reader to separate the essential issues from the 
inessential. On this consideration, Dijkstra proposes a nondeterministic program- 
ming language, based on guarded commands. Not only are the computation steps 


nondeterministic, the final answer of a guarded-command program may not be 
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uniquely determined by the inputs. This indeterminacy in the output is sometimes 
desirable in addition to nondeterminism in the computation steps, since many prob- 


lems admit several acceptable answers. 


The equation interpreter supports some of the advantages of nondeterministic 
specification of evaluation steps. Although the current implementation chooses, 
rather arbitrarily, to evaluate from left to right, the semantics of the language 
allow for any order of evaluation that finds a normal form. The presence or 
absence of real nondeterminism in the implementation is never visible to a user in 
the results of evaluation, since the restrictions of Section 5 guarantee uniqueness of 
the normal form, independently of order of evaluation. This guarantee of unique- 
ness is helpful in solving problems with uniquely determined answers, but it 
prevents taking full advantage of the simplifications allowed by nondeterministic 
and indeterminate programs for those problems with several acceptable answers. In 
particular, it prevents a satisfying implementation of guarded commands [Di76] by 
equations. The ideal facility would allow equational definitions with multiple nor- 


mal forms, but recognize special cases where uniqueness is guaranteed. 


There are no plans to introduce indeterminacy into the equation interpreter, 
not because of anything fundamentally repugnant about indeterminacy, but 
because the only simple way that we know to relax the uniqueness of normal forms 
causes other problems in the semantics of equational programming. The unique- 
ness of normal forms in equations satisfying the restrictions of Section 5 is a conse- 
quence of the Church-Rosser, or confluence, property. This property says that, 
whenever an expression A may reduce to two different expressions B and C, then 
there is another expression D to which both B and C reduce. The confluence pro- 


perty is required, not only to guarantee uniqueness of normal forms, but also to 
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guarantee that some normal form will be found whenever such exists. The problem 
arises when some expression A reduces to normal form, and also has an infinite 
reduction sequence. The confluence property guarantees that, no matter how far 
we pursue the infinite path, it is still possible to reduce to the normal form. 
Without the confluence property, we may have such an A, and an expression B 
appearing on its infinite reduction sequence, but B may not be reducible to normal 
form. Yet, by the semantics of equational logic, B is equal to A, and therefore to 


any normal form of A. Figure 15.2.1 shows this situation graphically. 


/ 
/ 


/ 


Figure 15.2.1 


So, in a system of equations without the confluence property, there may be an 
expression B with a normal form, but the only way to find that normal form may 


require backwards reductions. All of the helpful theory used to find an efficient 
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forward reduction to normal form fails to apply to backward reductions. It seems 
quite unlikely that a weaker condition than confluence can be designed to allow 
indeterminacy, but guarantee reductions to normal forms, so an indeterminate 
equation interpreter probably requires a substantial new concept in evaluation tech- 
niques. Even if such conditions are found, equational programming is probably the 
wrong setting for indeterminacy, since it would be impossible to have, for example, 
an expression a equal to each of the normal forms a and 4, and an expression 8 
equal to each of the normal forms b and c, without also having a =c and B =a. 
The computation techniques of reduction sequences could be applied to asymmetric 
congruence-like relations in order to produce indeterminacy, but such relations are 


not well established in logic, as is the symmetric congruence relation of equality. 


15.3. Dataflow 

The nested concurrency provided by outermost evaluation allows a very simple 
translation of dataflow programs into equational programs. A dataflow program 
[KM66] consists of a directed graph with an optional entry and an exit. The entry, 
if any, represents an input sequence of values, the exit an output sequence. Each 
edge in the graph represents a sequence of values communicated between two 
processes, represented by the nodes. In the semantics of a dataflow graph, the 
processes at the nodes are restricted to view their incoming sequences in order, and 
to produce each outgoing sequence in order, although several proposals for imple- 
menting them use numerical tags to simulate the order, and allow the actual time 
sequence to be different. A common kind of node, called a copy node, contains a 
process that merely copies its single incoming edge to each of its outgoing edges. It 
is easier for our purposes to group a copy node and its edges into one multiedge 


with a single head and several tails. 
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Every dataflow graph in which the node processes are all determinate as func- 
tions of their incoming sequences (i.c., they cannot make arbitrary or timing- 
dependent steps) may be translated easily into an equational program running on 
the equation interpreter. The basic idea is to use a convenient list notation, such as 
LISP.M, and replace each edge in the dataflow graph by a constant symbol, whose 
value will turn out to be the (possibly infinite) sequence of values transmitted 
through that edge. Each node becomes a set of function symbols, one for each out- 
going edge, with arity equal to the number of incoming edges. Equations are writ- 
ten to define each node function. Finally, the connection structure of the dataflow 
graph is realized by a single defining equation for each of the edge constants. That 
is, if f is the function symbol representing a node with incoming edges a and b, 
and outgoing edge c, include the equation c[] = f[a, 6]. In some cases structural 
equations may be condensed with some of the equations defining node processes. 
In particular, tree-shaped subgraphs of the dataflow graph may be condensed into 
single expressions, using edge names only to break loops. A single example should 


suffice to illustrate the technique. 


Example 15.3.1 
Consider the dataflow graph of Figure 15.3.1. The labels on edges and nodes indi- 
cate the corresponding constant and function symbols. Assuming that equations 
are given to define f, g, A, the following equations give the structure of the graph: 
al] = flinputfl, bil; 
bf = glafll; 
output] = hla; bf]. 


These equations may be condensed by eliminating either a or b (but not both), 
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Figure 15.3.1 


yielding 
60 = glflinputQ; bf: 
output] = hlflinputf]; bf; bf. 


or 
all = flinputf]; glaf]ll; 
output] = hlall; glaff]]. 


0 


In order to make use of the equational translation of a dataflow program, we also 
require a definition of input, and some mechanism for selecting the desired part of 
output, unless it is known to be finite (and even short). The idea of translating 
dataflow graphs into equations comes from Lucid [WA85]. 

For a more substantial example of dataflow programming, consider the prime 
number sieve of Eratosthenes. This elegant version of the prime number sieve is 
based on a dataflow program by Mcllroy [Mc68, KM77]. The basic idea is to take 


the infinite list (2 3 4 ...), and remove all multiples of primes, in order to produce 
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the infinite list of primes (the same one used to produce the multiples of primes 
that must be removed to produce itself). Figure 15.3.2 shows the dataflow graph 
corresponding to this idea, with each edge and node labelled by its corresponding 


symbol in the equational program. 


primes 


intlist [3] 


Figure 15.3.2 


The following equational program captures the concept of the dataflow graph 
above, varying the notation a hit for clarity. Of course, in the equational program, 


it is essential to evaluate, not the infinite lists themselves, but an expression produc- 
ing some finite sublist. 


Example 15.3.1 


: The following definitions are intended to allow production of lists of 
: prime numbers. The list of the first i prime numbers is firstnli; primes[]]. 


Symbols 


: List construction and manipulation operators 
cons: 2; 
nil: 0; 
first: 1; 
tail: 1; 
firstn: 2; 
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: Logical and arithmetic operators 

if: 3; 

add, subtract, multiply, modulo, equ, less: 2; 
: Operators associated with the prime sieve 

intlist: 1; 

sieve: 2; 

fact: 2; 

primes: 0; 


: Primitive domains 
include integer_numerals, truth_values. 


For all i, j, q, r: 


: firstlq] is the first element in the list q. 
: taillg] is the list of all but the first element in the list q. 


firslG. Q)] =i;  taillG. J] = 4; 
: firstnli; q] is the list of the first i elements in the list q. 
firstnli; q] = iflequli; 0]; O; Girstlq] . firstnlsubtractli; 1]; taillgl)J; 


: if is the standard conditional function. 
ifltrue; i; j] = i; iflfalse; i; j] = j; 


include addint, multint, subint, modint, equint, lessint; 


: intlistli] is the infinite list @ it1 i+2 ...). 

intlistli] = (i . intlistladdfi; 1]); 
: The definitions of sieve and fact assume that r is given in increasing order, that 
contains all of the prime numbers, and that r contains nothing less than 2. 


: sievelq; r] is the infinite list of those elements of the infinite list q that are 
‘not multiples of anything on the infinite list r. 


sievel@i . q); r] = iflfactli; rl; sievelq; r]; G . sievelg; rD)I; 
: factli; r] is true iff the infinite list r contains a nontrivial factor of i. 
factli; G . r)] = ifflessh; multiply[j; jl]; 
false; 
iflequlmoduloli; jl; 01; 


142 15. High-Level Techniques 


true, 


factli; r]]]; 
: primes[] is the infinite list of prime numbers, (2 3 57 11 13 17... 
primesf] = (2. sievelintlist[3]; primesfD. 


The correct behavior of this dataflow program for primes depends on the not-so- 
obvious fact that there is always a prime between n and n. If not, the loop 


through the sieve and cons nodes would deadlock. 


While the outermost evaluation strategy guarantees that the internal computa- 
tions of an equational program will satisfy the semantics of an associated dataflow 
graph, there is still an interesting issue related to input to and output from the 
interpreter. At the abstract level, input is any expression, and output is a 
corresponding normal form. It is semantically legitimate to think of the input 
expression being provided instantaneously at the beginning of the computation, and 
the output expression being produced instantaneously at the end. An implementa- 
tion following that idea strictly will not allow a dataflow behavior at the input and 
output interfaces of the interpreter. The all-at-once style of input and output forms 
a barrier to the composition of several equational programs, since it forces the, pos- 
sibly very large, expressions transmitted between them to be produced totally, 
before the receiving program may start. In the worst case, an infinite term may be 


produced, even though only a finite part of it is required by the next step. 


The current version of the equation interpreter incorporates a partial dataflow 
interface on output, but none on input. The output pretty-printers do not support 
incremental output, so a user may only benefit by using the internal form of out- 
put, or writing his own incremental pretty-printer. Interestingly, incremental out- 


put from the interpreter was easier to code than the all-at-once form. An output 
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process drives the whole evaluation, by traversing the input expression. When it 
reaches a symbol that is stable -- that can clearly never change as a result of 
further reduction -- it outputs that symbol, and breaks the evaluation problem into 
several independent problems associated with the arguments to the stable symbol. 
Section 17.2 gives a careful definition of stability, and Section 18.2 describes how it 
is detected. When the output process finds an unstable symbol, it initiates evalua- 
tion, which proceeds only until that particular symbol is stable. The control struc- 
ture described above is precisely the right one for outermost evaluation, even if 
incremental output is not desired. 

In the computation described above, the interpreter program must contain an 
a priori choice of traversal order for expressions -- Jeftmost being the natural 
order given current typographical conventions. Leftmost production of an output 
expression would support natural incremental output of lists, using the conventional 
LISP notation. Since the essential data structure of the equation interpreter is 
trees, not lists, and there is no reason to expect all applications to respect a left- 
most discipline, a future version of the interpreter should support a more flexible 
output interface. Probably the right way to achieve such an interface is to think of 
an interactive question-answer dialogue between the consumer of output and the 
interpreter. The precise design of the question-answer language requires further 
study, but it is likely to be something like the following. The output consumer 
possesses at least one, and probably several, cursors that point to particular subex- 
pressions of the normal form expression coming from the interpreter. The output 
consumer may issue commands of the form "move cursor x to a", where a may be 
the father, or any specified son, of x’s current position. The interpreter does pre- 


cisely enough evaluation to discover the new symbol under the cursor after each 
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motion, and reports that symbol on its output. Thus, the order of traversal may be 


determined dynamically as a result of the symbols seen so far. 


The question-answer output interface described above may also be used for the 
input interface, with the equation interpreter producing the questions, and some 
input process providing the answers. Such an interface will certainly allow connec- 
tion of several equational programs in a generalized sort of pipeline. Another sort 
of input interface is likely to be more valuable, and also more difficult to construct. 
In order for equational programs to be used as interactive back ends behind struc- 
ture editors, a purpose for which the equational programming style appears to be 
nicely suited as argued in Section 13, an incremental input interface must be pro- 
vided in which the input process (essentially the user himself) determines the order 
in which the input expression is produced. Worse, existing portions of the input 
term may change. Changing input is substantially more difficult to accommodate 
than incrementally produced input. Besides requiring an efficient mechanism for 
incremental reevaluation, avoiding the reevaluation of unchanged portions of an 
expression, some adjustment of the output interface is required to notify the output 
consumer of changes as they occur. Apparently, the output consumer must be able 
to specify a region of interest, and be notified of whatever happens in that region of 
the expression. Further study is required to find a convenient representation of a 
region of interest, taking into account that the topology of the expression is subject 


to change as well as the symbol appearing at a particular node. 


The incremental evaluation problem for equational programs appears to be 
substantially more difficult than that for attribute grammars [RTD83]. The 
elegance of Teitelbaum’s and Reps’ optimal reevaluation strategies for attribute 


grammars gives that formalism a strong edge in competition for beyond-context- 
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free processing. Work is in progress to close that gap. The same factors that make 
incremental evaluation more difficult for equational programs will also make a good 
solution more valuable than that for attribute grammars. The optimal reevaluation 
strategy for attribute grammars treats attributes and the functions that compute 
one attribute from another as primitives. The reevaluation strategy only deter- 
mines which attributes should be recomputed, it does not treat the problem of 
incremental reevaluation of an individual attribute with significant structure. Yet, 
one of the most critical performance issues for attribute reevaluation is avoidance 
of multiple copies and excess reevaluation of components of highly structured attri- 
butes, especially symbol tables. A lot of work on special cases of reevaluation of 
large, structured attributes is underway, but we are not aware of any thoroughly 
general approach. A good incremental evaluation strategy for equational programs 
will inherently solve the total problem, since the expression model of data underly- 
ing the equation interpreter makes all structure in data explicit, rather than allow- 


ing large structures to be treated as primitives. 


All of the preceding discussion of input and output ignores the possibility of 
sharing of identical subexpressions in an expression. The equation interpreter could 
not achieve an acceptable performance in many cases without such sharing. 
Perhaps such sharing should be accommodated in the treatment of input and out- 


put as well, but careful thought is required to do so without distasteful complexity. 


15.4. Dynamic Programming 


Dynamic programming may be viewed as a general technique for transforming an 
inefficient recursive program into a more efficient one that stores some portion of 
the graph of the recursively defined function in an array, in order to avoid recom- 


putation of function values that are used repeatedly. In a typical application of 
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dynamic programming, the user must completely specify the way in which the 
graph of the function is to be arranged in an array, and the order in which the 
graph is to be computed. The latter task may be handled automatically by the 
equation interpreter. To illustrate this automation of part of the dynamic program- 
ming task, we give equations for the optimal matrix multiplication problem of 
[AHU74]. Instead of defining only a small finite part of the graph of the cost 
function, we define the infinite graph, and the outermost evaluation strategy of the 
equation interpreter guarantees that only the relevant part of the infinite graph is 
actually computed. 

Example 15.4.1 

The following equations solve the optimal matrix multiplication problem from The 
Design and Analysis of Computer Algorithms, by Aho, Hopcroft and Ullman, sec- 
tion 2.8. Given a sequence of matrix dimensions, (dy d, -- - d,,,), the problem is to 
find the least cost for multiplying out a sequence of matrices M,+*M *-:-> M,, 
where M; is d;_,Xd;, assuming that multiplying an ixj matrix by a jXk matrix to 
get an ixk matrix costs ixj*k. There is an obvious, but exponentially inefficient 
recursive solution, expressed in a more liberal notation than that allowed by the 
equation interpreter: 


costl(dg- ++ d,,)] = 
min{costl (do +++ dj) Heost (dja, +++ dp.) Hdoxd +d, |0<i<m) 


cost((dod,)] =0 


The problem with the recursive solution above is that certain values of the function 
cost are calculated repeatedly in the recursion. Dynamic programming yields a 
polynomial algorithm, by storing the values of cost in an array, and computing 


each required value only once. In the following equations, the function cost is 
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represented by an infinite-dimensional infinite list giving the graph of the function: 


costgraph(O] = 


(0 (costl(1)] (cost{A 1] (ost 1 1].. . 
(cost[( 1 2)]... 


.d 
(cost{Q 2] (costI(l 2 1]... ) 
Censiltt 2 2)]...) 


me, 
(cost{(2)] (cost[(2 1)] couse 11J...) 
wad 7 


That is, cost{ (dg --- d,,)] is the first element of the list which is element d,,+1 of 
element d,, — ;+1 of ... element do+1 of costgraph{0]. cost[(i)] is always 0, but 
explicit inclusion of these Os simplifies the structure of costgraph. costgraph([al, 
for a*¥(), is the fragment of costgraph[Q] whose indexes are all prefixed by a. 
Symbols 


: operators directly related to the computation of cost 
cost: 1; 
costgraph: 1; 
costrow: 2; 
reccost: I; 
subcosts: 2; 


: list-manipulation, logical, and arithmetic operators 


cons: 2; 
nil: 0; 
min: 1; 
index: 2; 
length: 1; 
element: 2; 
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equ: 2; 

less: 2; 

subtract: 2; 

multiply: 2; 

include integer_numerals, truth_values. 


For all a, b, i, j, k, x, y: 

costla] = indexl[a; costgraphIOI]; 
: costgraphla] is the infinite graph of the cost function for arguments starting 
: with the prefix a. 

costgraphla] = (reccostla] . costrowla; 1]); 
: costrowla; i] is the infinite list 
: (costgraphlai] costgraphlai+1] ... ) 
: where ai is a with i added on at the end. 

costrowla; i] = 

(costgraphladdendla; i]] . costrowla; addli; 111); 

: reccostla] has the same value as costla], but is defined by the recursive equations 
: from the header. 

reccostl(i j)] = 0; reccostI()] = 0; reccost[O] = 0; 

reccostl(i j . a)] = minlsubcostsl(i j . a); lengthlall] 

where a is (k . b) end where; 

: subcostsla; i] is a finite list of the recursively computed costs of (dO ... dm), 
: fixing the last index removed at i, i-I, ... 1. 

subcostsla; i] = 

iflequli; 0]; 0; 


(addladdlcostffirstnladdli; 1]; all; costlafternladdli; 1]; aJJ]; 
multiplylmultiplylfirstla]; elementladdli; 1]; all; lastlal]DI; 


: Definitions of list-manipulation operators, logical and arithmetical operators. 
minl@)] = i; 
minl(i. a)] = ifflessli; minlall; i; minlal] 


where a is (k . b) end where; 
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indexlO; (x . bJ] = x; 

indexl(i . a); x] = indexla; elementladdli; 1]; xIl; 

length[O] = 0; 

lengthl(x . a)] = addflengthlal; 1]; 

elementli; (c . a)] = iflequli; 1]; x; elementlsubtractli; 1]; all; 
firstnli; a] = iflequli; 0]; O; Girstlal . firstn[subtractli; 1]; taillal)]; 
firstI(x . ad] = x; taille. @] = a; 

afternli; a] = iflequli; OJ; a; afternlsubtractli; 1]; taillall]; 
lastiG)] = x; 

lastl(x y . aj] = lastly . a]; 

addend[0; y] = (y); 

addend[(x . a); y] = (x . addendla; y)); 

ifltrue; x;y] =x; iflfalse; x; y] = y; 


include addint, equint, subint, multint. 


0 


Although the algorithm in [AHU74] runs in time O(n?) on a problem of size 
n, the equation interpreter, running the equations above, takes time O(n‘) because 
of the linear search required to look up elements of the graph of the cost 
function. —_ By structuring the graph as an appropriate search tree, the time could 
be reduced to O(n logn), but there is no apparent way to achieve the cubic time 
bound, since the equation interpreter has no provision for constant-time array 
indexing. Even the search-tree implementation requires some careful thought in 
the presence of infinite structures. Section 16.3 shows how search tree operations 


may be defined by equations. 
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Avoidance of recomputation of function values in the example above depends 
on the sharing strategy of the equation interpreter, described in Section 18.4. A 
future version of the interpreter will provide, as an option, an alternate implemen- 
tation of evaluation based on the congruence closure algorithm. [NO80, Ch80]. In 
that implementation, all recomputation will be avoided in every context, and the 


naive recursive equations can be used without the exponential cost. 


16. Implementing Efficient Data Structures In Equational Pro- 
grams 

The usefulness of the equation interpreter in its current form is severely limited by 
the lack of carefully-designed data structures. Instead of providing many choices 
of data structures, the equation interpreter provides the raw materials from which 
different data structures may be built. In order to make the system usable for any 
but small problems, a library of predefined data structures must be built. The 
modular constructs from Section 14 will then be used to incorporate appropriate 
definitions of data structures into a program as they are needed. This section 
demonstrates the basic techniques used to define some popular efficient data struc- 


tures. 


16.1 Lists 


Implementation of LISP-style lists in the equation interpreter is so straightforward 
that it is contained in several examples earlier in the text. We define the symbols 
car, cdr, cons, nil, equ, atom, and null, according to LISP usage, except that nil 


is not taken to be an atomic symbol. 


Example 16.1.1 
Symbols 


: List constructors. 
cons: 2; 
nil: 0; 
include atomic_symbols; 


: Selectors for the head, tail of a list. 
car, cdr: 1; 


: Predicates on lists. 
equ: 2; 
atom: I; 
null: 1; 
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: Boolean symbols. 
and: 2; 
include truth_values. 
For all x, x1, x2, 1, II, 12: 
carl(x . DJ] = x; 


cdrl(x . DJ = 1; 


equlQ; O] = true; 
equl(x1 . 11); (<2. 12)] = andlequlx1; x2]; equll1; 12I]; 


equl(x! . 11); x2] = false 
where x2 is either () or in atomic_symbols end or end where; 


equix1; x2] = false 
where x1 is in atomic_symbols, x2 is either Q or (x . D end or end where; 


equlQ; x2] = false 
where x2 is either (x. D) or in atomic_symbols end or end where; 


include equatom; 
atom[x] = true 
where x is in atomic_symbols end where; 
atoml0] = false; 
atoml(x . D] = false; 


nulllO] = true; 
nulll(c . D] = false; 
nulllx] = false 
where x is in atomic_symbols end where; 
andl[true; true] = true; 
andltrue; false] = false; 
andffalse; true] = false; 
andffalse; false] = false. 
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0 


The implementation of lists described above is sufficient for all programming 
in the style of pure LISP. It is not necessarily the implementation of choice for all 
list applications. The following equational program defines nonempty lists using 
cat (concatenate two lists) and /ist (create a list of one element) as constructors 
instead of cons. The resulting notation for lists no longer shows the LISP preju- 
dice toward processing from first to last, so, instead of car and cdr, the four selec- 
tors first, last, head (all but the last), and tail (all but the first) are given. Since 
empty lists are not represented, it is appropriate to have the test singleton instead 


of null. 


Example 16.1.2 


Symbols 


: List constructors. 
cat: 2; 
list: 1; 
include atomic_symbols; 
: List selectors. 
first: 1; 
last: 1; 
head: 1; 
tail: 1; 
: Predicates on lists. 
equ: 2; 
singleton: 1. 
For all x, 11, 12, 13: 
first(listG)) = x; 


first(cat (11, 12)) = first(11); 


last(list()) = x; 


last(cat(11, 12)) = last(12); 
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head(cat(11, cat(12, 13))) = cat(11, head(cat(12, 13))); 
head(cat (11, list(x))) = I; 


tail(cat(cat(11, 12), 13)) = cat(tail(cat(11, 12)), 13); 
tail (cat (list (x), 13) = 13; 


equ(list(x1), list(<2)) = equ(xI, x2); 


equ(!1, 12) = and(equ(first (11), first(12)), equ(tail(11), tail(12))) 
where 11, 12 are cat(13, 14) end where; 


equ(list(x), cat(I1, 12)) = false; 
equ(cat(I1, 12), list(x)) = false; 


include equatom; 


singleton (list(x)) = true; 
singleton(cat(11, 12)) = false; 


: Use the same definition of and as in Example 16.1.1. 


0 


The list-cat representation of lists differs from the LISP version in making con- 
catenation just as cheap as adding to the head, at the expense of an increase in the 
cost of producing the first element of a list. Perhaps more significant is the effect 
on infinite lists. A LISP list may only be infinite to the right, lists constructed 
from cat may have infinite heads as well as infinite tails, and even infinite inter- 


mediate segments. 


The list-cat style of lists is symmetric with respect to treatment of the first 
and last elements, but it still makes production of intermediate elements more 


clumsy than firsts and lasts. A rather cute application of error markers, as 
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described in Section 12.3, minimizes the clumsiness. In the following equations, 
select(n,l) selects, if possible, the nth element from the list /. short (i) reports 


that a list was short by i elements for the purpose of producing a specified element. 


Example 16.1.3 
select(n, list(x)) = if(equ(n, 1), x, short(subtract(n, 1))); 
select(n, cat(11, 12)) = 
if(tooshort (select (n, 11)), 
select (subtract(n, shortby(select(n, 11))), 12), 
select(n, 11)); 
tooshort(short(n)) = true; 
tooshort(x) = false 
where x is in atomic_symbols end where; 
shortby(short(n)) = n; 
if(true, x, y) = x; if(false, x, y) = y; 


include subint. 


0 


In order to give similar treatment to all elements of a list, an extension to the 
cat notation is required. listlnth(n,1) marks the list / as having length n. Asa 
special case, listInth (1,x) represents the singleton list of x, replacing /ist (x) in the 
earlier examples. This variation implies a slightly different semantic view of lists, 
where a singleton list is the same thing as its sole element. Regrettably, an addi- 
tional symbol icat (inactive concatenation) is required for concatenations that 
have already been associated with their lengths, in order to avoid an infinite com- 


putation for a concatenation. The following equations may be used to redefine 
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select in the new notation. 


Example 16.1.4 


cat(listinth(n1, 11), listInth(n2, 12)) = 
listInth(add(nl, n2), icat(listInth(n1, 11), listInth(n2, 12))); 


select(n, listinth(nI, J) = if(less(n1, nd, 
short (subtract(n, n1)), 
select(n, l1)); 
select(1, x) = x where x is in atomic_symbols; 


select(n, icat(listinth(n1, 11), listInth(n2, 12))) = 


if(less(n1, n), 
select(subtract(n, n1), listInth(n2, 12)), 
select (n, 11)). 


0 


In addition to its clumsiness, the /ist/nth notation suffers from an inability to deal 


with any sort of infinite list. 


Even using one of the variations of cat notation, provision should probably be 
made for the empty list. The easiest way to add the empty list is to use the nullary 
symbol empty, and allow any number of emprys to appear in a list, ignored by all 
operations on that list. It is easy to modify the definitions of first, last, head, tail, 
etc. to ignore emptys. The more efficient, but syntactically clumsier, solution is to 
eliminate empty wherever it appears in a concatenation. Unfortunately, the obvi- 
ous solution of adding the equations cat (empty 0, 1) =/ and cat (I, empty) =1 
will not work because of the restrictions on left-hand sides of equations. First, 
these two equations have a common instance in cat (empty, empty). That problem 
is easily avoided (at the cost of extra evaluation steps) by changing the second 


equation to 
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cat(l, emptyO) = 1 
where | is either list(x) or cat(11, 12) end or end where; 


There is still a problem with overlap of the new equations with every other equa- 
tion describing a function recursively by its effect on a concatenation, for instance 
the second equation in Example 16.1.2, first (cat (J1, 12)) = first (11). In order to 
avoid those overlaps, we must introduce the inactive concatenation symbol, icat, 


using it in place of cat as the list constructor, and adding the equations 


cat(emptyO, D = 1; 


cat(I, emptyO) = 1 
where | is either list(x) or icat(I1, 12) end or end where; 


cat(I1, 12) = icat(11, 12) 
where 11, 12 is either list(x) or icat(I1, 12) end or end where. 
This is another example of the last technique for removing overlaps described in 


Section 12.4 


While the variations on list representation described above avoid a certain 
amount of unnecessary searching, and appear to have a heuristic value, their worst 
case operations involve complete linear searches. To improve on the worst case 
time for list operations, we must balance the tree representations of lists. Such bal- 


anced list representations may be derived by taking the search tree representations 


of Section 16.3, and omitting the keys. 


16.2. Arrays 


There is no way to implement arrays with constant access time in the equation 
interpreter. Such arrays could be provided as predefined objects with predefined 
operations, in the style of the arithmetic operations, but only at the cost of substan- 


tial increase to the conceptual complexity of storage management. Instead, we pro- 
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pose to implement arrays as balanced trees, accepting a logarithmic rather than 
constant access time. The following definitions implement one-dimensional arrays 
ranging over arbitrary subranges of the integers. Three constructors are used: 
array (i, j, a) denotes an array with indexes ranging from i to j, with contents 
described by a. In describing the contents, arrbranch (a1, a2) denotes an array, or 
subarray, in which locations indexed by integers ending with binary bit 0 are given 
in al, and those indexed by integers ending with binary bit 1 are given in a2. 
arrelement (x) denotes the single array element with value x. element (i, a) pro- 
duces the element indexed by i in the array a, constarry(i, j, x) produces an 
array, indexed from i to j, containing (j — i)+1 copies of x. update(a, i, x) is 


the array a with the element indexed by i changed to the value x. 


Example 16.2.1 
Symbols 


: Array constructors. 
array: 3; 
arrbranch: 2; 
arrelement: 1; 


: Array selector. 
element: 2; 


: Array initializer. 
constarray: 3; 


: Array modifier. 
update: 3; 


: Functions used internally for array computations. 
const: 2; 


: Arithmetic operators for index computations. 
equ: 2; 
subtract, divide, modulo: 2; 
ve integer_numerals, truth_values; 
if: 3. 


For all i, j, k, a, al, a2, x, y: 
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element(i, array(j, k, a)) = element (subtract(i, j), a); 
element(i, arrbranch(al, a2)) = 
if(equ(modulo(i, 2), 0), 
element (divide(i, 2), al), 
element (divide(i, 2), a2)); 


element (0, arrelement(x)) = x; 


constarray(i, j, x) = array(i, j, const(subtract(j, i), x)); 
const (i, x) = iflequii, 1), 


x, 
arrbranch(const (divide(i, 2), x), 


const (subtract (i, divide(i, 2)), x))); 


update(array(i, j, a), k, x) = update(a, subtract(k, i), x); 
update(arrbranch(al, a2), i, x) = 
if(equ(modulo(i, 2), 0), 
arrbranch(update(al, divide(i, 2), x), a2), 
arrbranch(al, update(a2, divide(i, 2), x))); 


update(arrelement (y), 0, x) = arrelement (x); 


include equint, modint, divint, subint. 


0 
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In principle, the constructor arrelement is superfluous, and array elements could 


appear directly as arguments to arrbranch. 


In that case, the equation 


element (0, arrelement(x)) = x would be replaced by element (0, x) =x, with a 


where clause restricting x to whatever syntactic forms were allowed for array ele- 


ments. The version given above has the advantage of allowing any sorts of ele- 


ments for arrays, including other arrays. Multidimensional arrays are easily 


represented by arrays whose elements are arrays of dimension one less. 


Notice that the outermost evaluation strategy of the equation interpreter 
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causes constarray and update expressions to be worked out only as far as required 
to accommodate element operations. So, some condensation of sparsely accessed 
arrays is provided automatically. It is very tempting to allow the element opera- 
tion to take a greater advantage of sparseness, using the equation 
element (i, const (j,x)) =x. Unfortunately, this equation overlaps with the recur- 
sive definition of const, violating the syntactic restrictions of Section 5. In this 
case, the violation is clearly benign, and future versions of the interpreter should 
relax the restrictions to allow overlaps of this sort. Further research is required to 
find an appropriate formal description of a useful class of benign overlaps. Such a 
formalism will probably be based on the Knuth-Bendix closure algorithm [KB70], 


extended to deal with possibly nonterminating reductions. 


The explicit use of index bounds in array (i, j, a) prohibits infinite arrays, but 
infinite arrays may be implemented with the arrbranch constructor, assuming that 
the range always starts at 0. Arrays infinite on the left as well as the right are 
easily provided by piecing together two arrbranch structures, and using an initial 


comparison with 0 to direct operations to the appropriate portions. 


In the array implementation described above, indexes are grouped in terms 
according to their agreement on low—order bits. E.g., the even indexes all go to 
the left of the first branch point, and the odd indexes go to the right. For applica- 
tions involving only element and update operations, the order of indexes is 
irrelevant, and this one was chosen for arithmetic simplicity. If other primitives, 
operating on numerically contiguous sets of indexes, are desired, the definitions 
may be modified to treat bits of an index from most significant to least significant, 
at the price of slightly more complex arithmetic expressions in the definitions. This 


rearrangement of indexes will also affect the benefits of sparse access by changing 
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the structure of contiguous elements in the data structure. 


16.3. Search Trees and Tables. 


The arrays of Section 16.2 avoid explicit mention of index values in the data struc- 
ture by assuming that those values are chosen from a contiguous range. Sparseness 
in the use of the indexes may allow savings in space, due to the outermost evalua- 
tion strategy leaving some instances of const unevaluated, but the time cost of each 
element and update operation will be proportional to the logarithm of the total 
index range, no matter how sparsely that range is used. When the range of legal 
indexes is so great that even this logarithmic cost is not acceptable, balanced 
search trees should be used instead of arrays. At the cost of storing index values 
(usually calied keys in this context) as well as element values, we can let the cost 
of access be proportional to the logarithm of the number of keys actually used, 
rather than the total range of keys. This section shows three alternative definitions 
of balanced search trees. The first two were developed by Christoph Hoffmann, 


and the description is adapted from [HO82b]. 


One popular balanced tree scheme that can be implemented by equations is 
based on 2-3 trees, a special case of B—trees. Informally, a 2-3 tree is a data 
structure with the following properties. In a 2-3 tree, there are 2—nodes, with two 
sons and 3—nodes, with three sons. A 2-node is labelled by a single key a, so that 
the keys labelling nodes in the left subtree are all smaller than a, and the keys 
labelling nodes in the right subtree are larger than a. A 3-node is labelled by two 
keys a and b, with a<b, so that the keys in the left subtree are all smaller than a, 
the keys in the middle subtree are between a and b, and the keys in the right sub- 
tree are larger than b. A leaf is a node whose subtrees are all empty. A 2-3 tree 


is perfectly balanced, i.e., the path lengths from the root to the leaves of the tree 
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are all equal. Figure 16.3.1 below shows an example of a 2-3 tree. In this section, 
we will always show search trees as collections of keys only, with a membership 
test. It is straightforward to add arbitrary information associated with each key, 
and augment the membership test to produce that information. The augmentation 
is easy, but obscures the more interesting issues having to do with balancing the 


distribution of the keys, so it is omitted. 


Figure 16.3.1 


When inserting a new key k into a 2-3 tree, there are two cases to consider. 
If the proper place for inserting k is a 2-node leaf, then we simply convert the leaf 
to a 3-node. If the insertion should be made into a 3-node leaf, then we must 
somehow restructure parts of the tree to make space for k. The restructuring 
proceeds as follows. First form a 4-node, that is, a node with three keys a, b, and 
c, and four subtrees, as shown in Figure 16.3.2(a). Of course, if we begin with a 
leaf, then the subtrees are all empty. Now split the 4-node into three 2-nodes as 
shown in Figure 16.3.2(b). The key of the middle 2-node must be inserted into the 


node that is the former father of the 4-node, since, through the splitting, we have 
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alot. 


Figure 16.3.2 


increased the number of sons. If the father node is a 2-node, then it becomes a 3- 
node, otherwise the splitting process is repeated on the father’s level. If there is no 
father, i.e., if we have just split the tree root, then no further work is required. 
Note that without the insertion of the middle node into the father we would des- 
troy the balance of the tree. 

We denote a 2-3 tree by an expression tree(t), where ¢ represents the labelling 
and structure of the tree. A nonempty subtree with a 2-node root is written 
t2(x1, 1/1, x2), where x1 and x2 represent its left and right subtrees, and /1 is the 
label of the node. Similarly, 3(x1, /1, x2, 1/2, x3) denotes a 3-node with labels 11 


and 12 and subtrees x1, x2, and x3. The constant e denotes the empty subtree. 


Example 16.3.1 
The 2-3 tree of Figure 16.3.1 is represented by 
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tree(t2(t2(t2(e(0), 1, eQ), 
13000, 3, 0, 4, e0)), 
7311300, 11, eO, 12, e0), 
C0, 15, eQ), 
300, 20, eQ)))) 


0 


Now we program the insertion of a key k into a 2-3 tree tree (x). Insertion of 
a key k proceeds by first locating the leaf in which to insert k. For this purpose, 
we must compare k to the labels of a node. If the comparison detects equality, 
then k is already in the tree and the insertion is done. Otherwise, the result of the 


comparison determines a subtree into which k is inserted. 


Example 16.3.2 


Symbols 


: Constructors for search trees. 
: tree, t2, and t3 are active symbols, as well as constructors. 
tree: 1; 


e: 0; 
include atomic_symbols, integer_numerals; 


: Operations on search trees. 
insert: 2; 
member: 2; 


: Symbols used in the definition of insert. 
put: 3; 


: Arithmetic and logical operators. 
less: 2; 
equ: 2; 
if: 3. 


For all k, 1, 1, 12, x, x1, x2, x3, y: 
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: insert (k, x) inserts the key k in the tree x. 
insert(k, tree(x)) = tree(insert(k, x)); 
insert(k, t2(x, lL, y)) = 
if(equ(k, D, t2Cx, L, y), 
if(less(k, D), t2(insert(k, x), L, y), 
12(x, I, insert(k,y)))); 
insert(k, t3(c1, Ll, x2, 12, x3) = 
if(or(equ(k, 11), equtk, 12)), t3(x1, 11, x2, 12, x3), 
if(ess(k, 11), t3(insert(k, x1), I, x2, 12, x3), 
if(less(k, 12), t3Gc1, U1, insert(k, x2), 12, x3), 
13(x1, I, x2, 12, insert(k, x3))))); 


insert(k, eO) = put(eQ, k, eQ); 


12(put (x, k, y), I, x2) = 13(x, k, y, Hl, x2); 
126c1, U, put, k, y)) = 13Gc1, Il, x, k, y); 


13(put(x, k, y), HI, x2, 12, x3) = put(t2(x, k, y), I, 12(x2, 12, x3); 
1301, UI, put (x, k, y), 12, x3) = put(t2¢c1, I, x), k, t2(y, 12, x3)); 
13Cc1, 11, x2, 12, put(x, k, y)) = put(t2Gc1, I, x2), 12, 12(x, k, y)); 


tree(put (x, k, y)) = tree(t2(x, k, y)); 


if(true, x, y) = x; if(false, x, y) = y; 


include lessint, equint; 


: member(k, x) tests whether the key k occurs in the tree x. 


member(k, D = equ(k, D 
where | is in atomic_symbols end where; 


member(k, eO) = false; 
member(k, tree(x)) = member(k, x); 
member(k, t2(x1, ll, x2)) = 


if(less(k, 11), 
member‘(k, x1), 


oe 
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iflequ(k, 1), 
true, 
member(k, x2))); 
member(k, t3(Cx1, ll, x2, 12, x3)) = 
if(less(k, 11), 
member(k, x1), 
if(equ(k, 11), 
true, 
if(less(k, 12), 
member(k, x2), 
if(equ(k, 12), 


true, 
member(k, x3))))). 


0 
The equations of Example 16.3.2 contain a program for inserting a key into a 2-3 
tree. Although the equations are very intuitive, they do not obey the restrictions of 


Section 5. For example, the second and fifth equations overlap in the expression 
tree (ins (3, t2(put (e, 2, ), 4, e0))). 


The problem arises conceptually because the described insertion proceeds in two 
phases: a traversal from the tree root to the insertion point, followed by a reverse 
traversal restructuring the nodes encountered up to the nearest 2-node, or up to the 
root if no 2-node is found. The overlap in the expression above corresponds to the 
competition between two insertions, one in the restructuring phase, when equation 


5 applies, the other in the initial traversal, when equation 2 applies. 


The problem can be solved in the traditional manner of setting locks to 
prevent the progress of subsequent insertions where they may interfere with previ- 
ous updates which are not completed. This is easily done by indicating a locked 
node as 12/(...) instead of 12(...), and 23/(...) instead of 23(...). The equations for 


this solution are given below. Note that we force complete sequentiality of inser- 
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tions, because the root is locked in this solution. Notice also that the tree construc- 
tors 12 and 3, which were active in Example 16.3.2, are pure constructors in 
Example 16.3.3. The new symbols ¢2/ and 13/ acquire the active role. Since the 
symbols ¢2 and #3 may be thought of as being split into active versions, t2/ and 
131, and inactive versions t2 and £3, this is another example of the last technique 
of Section 12.4 for repairing overlaps, although it was originally thought of in 


terms of record locking techniques from concurrent programming. 


Example 16.3.3 
Symbols 


: Constructors for search trees. 
tree: 1; 


e: 0; 
include atomic_symbols, integer_numerals; 


: Operations on search trees. 
insert: 2; 
member: 2; 


: Symbols used in the definition of insert. 
put: 3; 
treel: 1; 
t2I: 3; 
t3l: 4; 
unlock: 1; 


: Arithmetic and logical operators. 
less: 2; 
equ: 2; 
or: 2; 


ift 3. 

For all k, 1, 11, 12, x, x1, x2, x3, y: 

: insert(k, x) inserts the key k in the tree x. 
insert(k, tree(x)) = treel(insert(k, x)); 


insert(k, t2(x1, 11, x2)) = 
if(less(k, 11), t21(insert(k, x1), 11, x2), 


ee 


168 16. Implementing Data Structures 


if(less(11, k), t21Gc1, LI, insert(k, x2)), 
unlock (t2(x1, I, x2)))); 
insert(k, t3(x1, LI, x2, 12, x3) = 
if(or(equ(k, 11), equ(k, 12)), unlock(t3c1, 11, x2, 12, x3)), 
if(lesstk, 11), t3l(insert(k, x1), LU, x2, 12, x3), 
if(less(k, 12), t31Cx1, 11, insert(k, x2), 12, x3), 
131 (x1, U1, x2, 12, insert(k, x3))))); 


__ insert(k, e0) = put(eQ, k, eQ); 


t21 (put (x, k, y), I, x2) = unlock (t3(x, k, y, U1, x2)); 
121 (x1, LU ,put(x, k, y)) = unlock (t3(x1, Ll, x, k, y)); 


131 (put (x, k, y), H, x2, 12, x3) = put(t2(x, k, y), 1, t2c2, 12, x3)); 
131 (x1, U1, put(x, k, y), 12, x3) = put(t2(x1, U, x), k, t2(y, 12, x3)); 
131(x1, HI, x2, 12, put(x, k, y)) = put(t2Cc1, U1, x2), 12, 12x, k, y)); 


treel(put(x)) = tree(x); 


t2l(unlock (x1), I, x2) = unlock(t2(x1, Ll, x2)); 
121(x1, U, unlock (x2)) = unlock (t2(x1, LI, x2)); 


131 (unlock (x1), 11, x2, 12, x3) = unlock (t3 (x1, 11, x2, 12, x3)); 
131Cc1, 1, unlock (x2), 12, x3) = unlock (t3(x1, 11, x2, 12, x3); 
13101, LU, x2, 12, unlock(c3)) = unlock (t3(x1, LU, x2, 12, x3)); 


treel(unlock(x)) = tree(XD; 
or(true, x) = true; or(false, x) = x; 


: The equations for if, equ, less, and member are the same as in Example 16.3.2. 
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A different solution was proposed in [GS78], eliminating the need for locking 
the whole traversal path. The trick is to split the nodes encountered on the down- 
ward traversal if they do not permit the insertion of another key without splitting. 
Since a 3-node at the root of a 2-3 tree cannot be split on the way down without 
destroying the balance (there is no third key available), we must now deal with 2- 
3-4 trees, permitting 4-nodes also with three keys /1, /2 and /3, represented by 
t4(x1, 11, x2, 12, x3, 13, x4) 
The equational program becomes a little more complex, since we have an addi- 
tional node type. Nonetheless, 14 equations suffice. The transformations required 
to eliminate a 4-node as point of insertion are given by Figure 16.3.3. Mirror 
image cases have been omitted. 

An equational definition of 2-3-4 tree insertion follows. Note that with this 
program we may insert keys in parallel without interference problems. Notice also 


that this solution respects the constructor discipline of Section 12.1. 


Example 16.3.4 
Symbols 


: Constructors for search trees. 
tree: 1; 


e: 0; 
include atomic_symbols, integer_numerals; 


: Operations on search trees. 
insert: 2; 
member: 2; 


: Symbols used in the definition of insert. 
chk2: 5; 
chk3: 7; 
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Figure 16.3.3 
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: Arithmetic and logical operators. 
less: 2; 
equ: 2; 
or: 2; 


if: 3. 
For all k, 1, 11, 12, x, x1, x2, x3, y: 
: insert(k, x) inserts the key k in the tree x. 
insert(k, tree(eQ)) = tree(t2(eQ), k, e0)); 
insert(k, tree(t2(x1, 11, x2))) = tree(insert(k, t2(x1, U1, x2))); 


insert(k, tree(t3(c1, 11, x2, 12, x3))) = 
tree(insert(k, t3Gc1, U, x2, 12, x3))); 


insert(k, tree(t4c1, I, x2, 12, x3, 13, x4))) = 
tree(insert(k, 12(12(c1, I, x2), 12, t2(x3, 13, x4)))); 


insert(k, t2(x1, 1, x2)) = 
if(less(k, 11), chk2(k, x1, x2, 11, 1), 
if(less (11, k), chk 2k, x2, x1, 11, 2), 
12(x1, 11, x2))); 


chk2(k, eO, y, 1, = 
if(equ(i, 1), 
13(eQ, k, eQ, 1, eQ), 
13(e0, 1, eQ, k, eO)); 


chk2(k, t2Gc1, UI, x2), y, Lv = 
if(equ(i, 1), 
12(insert(k, t2(x1,11,x2)), 1, y), 
12(y, L, insert(k, t2(x1, H, x2)))); 


chk2(k, 13Cc1, U, x2, 12, x3), y, 1 i = 
if(equ(i, 1), t2(insert(k, 31, 11, x2, 12, x3), L, y), 
12(y, L, insert(k, t3Gc1, U1, x2, 12, x3)))); 


chk2(k, t4Gc1, 11, x2, 12, x3, 13, x4), y, LD = 
if(equ(k, 12), 
if(equ(i, 1), 
12(t4(x1, I, x2, 12, x3, 13, x4), 1, y), 
a 1, 14Gx1, LU, x2, 12, x3, 13, x4))), 
if(less(k, 12), 
if(equ(i, 1), 
13 (insert (k, t2(x1, 11, x2)), 12, t2(x3, 13, x4), L y) 
oe L, insert (k, t2(x1, 1, x2)), 12, t2(x3, 13, x4))), 
if(equ(i, 1), 
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t3(12Cx1, 11, x2), 12, insert(k, t2(x3, 13, x4)), 1, Di 
13(y, 1, (261, I, x2), 12, insert(k, 1203, 13, x4)))))); 


insert(k, t30c1, lI, x2, 12, x3)) = 
if(or(equck, 11), equtk, 12), 
t3(x1, I, x2, 12, x3), 
if(less(k. 11), 
chk3(k, x1, x2, x3, 11, 12, 1), 
if(less(k, 12), 
chk3(k, x2, x1, x3, LI, 12, 2), 
chk3(k, x3, x1, x2, 11, 12, 3); 


chk3(k, eO0, x, y, l,m, D = 
iflequ(i, v, 
t4(eQ), k, eO, 1, e0, m, eQ), 
if(equ(i, 2), 
14(eQ), 1, eQ, k, eQ, m, eQ), 
14(e0, I, eO, m, e0, k, eQ))); 


chk3(k, 201, I, x2), x, y, l,m, i) = 
if(equ(i, 1), 
13 (insert(k, t2(x1, 1, x2)), I, x, m, y), 
iffequii, 2), 
13(x, L, insert(k, t2Gx1, lI, x2)), m, y), 
13(x, 1, y, m, insert(k, t2(x1, 11, x2))))); 


chk3(k, t3Cc1, UI, x2, 12, x3), x, y, lm, i) = 
if(equ(i, 1), 
13 (insert (k, t3Gc1, 11, x2, 12, x3)), 1, x, m, y), 
if(equ(i, 2), 
13(x, 1, insert(k, 1301, 11, x2, 12, x3), m, y), 
t3(x, 1, y, m, insert(k, t3(x1, 11, x2, 12, x3))))); 


chk3(k, t4(x1, 11, x2, 12, x3, 13, x4), x, y, 1m, D = 
if(equ(k, 12), 
ifequ(i, 1), 
13(t4(x1, LI, x2, 12, x3, 13, x4), 1, x, m, y), 
if(equ(i, 2), 
t3(x, 1, 401, 11, x2, 12, x3, 13, x4), m, y), 
13(x, 1, y, m, t4Ccl, U1, x2, 12, x3, 13, x4)))), 
if(less(k, 12), 
if(equ(i, 1), 
14 (insert(k, t2(x1, Ll, x2)), 12, 
12(X3,L3,X4), L, X, M, Y), 
if(equ(i, 2), 
14(x, 1, insert(k,t2Cc1, lI, x2)), 
12, 1203, 13, x4), m, y), 
t4(x, L, y, m, insert(k, t2(c1, lI, x2)), 
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12, 126x3, 13, x4), 
iflequ(i, 1), 
14(t2(x1, I, x2), 12, 
insert (k, 1203, 13, x4)), 
I, x, m, y), 
iflequii, 2 3 
14(x, 1, t12Gc1, I, x2), 12, 
insert (k, t2(x3, 13, x4), m, y), 
t4(x, I, y, m, 121, II, x2), x2, 
insert (k, t2(x3, 13, x4; 


: The equations for if, equ, less, and member are the same as in Example 16.3.2. 


0 

Computation using Example 16.3.4 proceeds as follows. Upon encountering a 
2-node or a 3-node, the node is locked (with chk2 or chk3) and the proper subtree 
for insertion is located. That subtree becomes the second parameter of the function 
chk2 or chk3. If the root of that subtree is a 4-node, then the 4-node and the 
locked parent are restructured according to the transformations of Figure 16.3.3. 
After restructuring, the parent node is released. If the root of the subtree is a 3- 
node or a 2-node, then no restructuring is needed, and the parent node is released. 
If the subtree is empty, then the locked parent node is a leaf, and we insert the 
key. The equations also account for the possibility that the key to be inserted is in 
the tree already. Using these equations, we can only attempt inserting a key into a 
2-node or a 3-node, thus no upward traversal is needed, and insertions can be done 


in parallel 


One more variant of balanced search is programmed below, for comparison. 
The algorithm expressed by these equations comes from the section on "top-down 
algorithms" in [GS78]. The "black nodes" of [GS78] are represented by 


node(s,i,t), and the "red nodes" are represented by red (node(s,i,t)). Some of the 
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where clauses restricting substitutions for variables in the equations are semanti- 
cally unnecessary, but are required to avoid illegal overlapping of left-hand sides 


(see restriction 4 in Section 5). 


Example 16.3.5 
Symbols 


: Constructors for search trees 
tree: 1; 
node: 3; 
red: I; 
nil: 0; 


: Operations on search trees 
insert: 2; 
member: 2; 


: Symbols used in the definition of insert 
inserti: 2; 
insertl, insertr: 2; 
: Arithmetic and logical operators 
ift 3 
less: 2; 
equ: 2; 
include integer_numerals, truth_values. 


For all c, i, j, k, l,m, s, t, u,v, w, x, y, Zz 
: The symbol tree marks the root of a search tree. 


tree(red(s)) = tree(s); 


: insert(i, O inserts the integer i into the search tree t. 


insert(i, tree(s)) = tree(insert(i, s)) 
where s is either node(t, j, u) or nilO end or end where; 


insert(i, nil0) = red(node(nilO, i, nilO)); 


insert(t, red(s)) = red(insert(i, s)) 
where s is either node(t, j, u) or nilO end or end where; 


insert(i, node(red(s), j, red(t))) = 
red(inserti(i, node(s, j, t))); 
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insert (i, node(s, j, D) = inserti(t, node(s, j, v) 
where s, t are either node(u, k, v) or nilQ end or end where, 


insert(i, node(s, j, t)) = inserti(i, node(s, j, t)) 
where s is red(w), t is either node(u, k, v) or nilO end or end where; 


insert (i, node(s, j, t)) = inserti(i, node(s, j, t) 
where t is red(w), s is either node(u, k, v) or nilO end or end where; 


: inserti(i, ) inserts the integer i into the search tree t, assuming that t is a 


tree with at least two nodes, and the sons of the root are not both red. 


inserti(i, node(s, j, 0) = 
if(equii, j), nodets. v, 
if less (i, j), insertl(i, node(s, j, 1), 
insertr(i, node(s, j, J)); 


: insertl(i, w) inserts the integer i into the left part of the tree t. 


insertl(i, node(red(node(s, j, red(node(t, k, u)))), 1, v)) = 
red(inserti(i, node(node(s, j, t), k, node(u, 1, v)))) 
where s is either node(w, m, x) or nilQ end or end where; 


insertl(G, node(s, j, t)) = node(insert(i, s), j, t 
where s is either node(u, k, v) or nilO or'red(node(w, I, x)) 
where w, x are either node(y, m, z) or nilO end or end where 
end or end where; 


: insertl(i, UD inserts the integer i into the right part of the tree t. 


insertr(i, node(s, j, red(node(t, k, red(u))))) = 
red(inserti(i, node(node(s, j, tJ, k, u))) 
where t is either node(w, m, x) or nilO end or end where; 


insertr(i, node(s, j, red(node(red(node(t, k, u)), 1, v)))) = 
red(inserti(i, node(node(s, j, t), k, node(u, 1, v)))) 
where v is either node(w, m, x) or nilO end or end where; 


insertr(i, node(s, j, 0) = node(s, j, insert(i, t)) 
where t is either node(u, k, v) or nilO or red(node(w, I, x)) 
where w, x are either node(y, m, z) or nilQ end or end where 
end or end where; 
: member(i, t) is true if and only if the integer i appears in the tree t. 
member (i, nil0) = false; 


member (i, tree(s)) = member(i, s) 
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where s is either node(t, j, u) or nilO end or end where; 
member(i, red(s)) = member “(i, s); 
member(i, node(s, j, 0) = iflequGi, j), true, 
if(less(i, j), member (i, s), 
member(i, ))); 
_if(true,s,0 =s;  ifffalse, s, ) = t; 


include equint, lessint. 


0 

Notice that the membership test operator ignores balancing issues. Some 
schemes for maintaining balanced search trees attach explicit balancing behavior to 
the membership test. Such a technique is undesirable in the equational definitions 
of search trees, because the lack of side-effects during evaluation means that any 
balancing done explicitly by a membership test will not to benefit the execution of 
any other operations. The outermost evaluation strategy used by the interpreter 
has the effect of delaying execution of insertions until the resulting tree is used, so 
the actual computation is in fact driven by membership tests. It would be nice to 
take advantage of the observation in [GS78] that balancing may also be made 
independent of insertion, but that approach seems to lead into violations of the 


nonoverlapping and left-sequential requirements of Section 5. 


17. Sequential and Parallel Equational Computations 

This section develops a formal mechanism for distinguishing the sequentializable 
equational definitions supported by the current version of the equation interpreter, 
from inherently parallel definitions, such as the parallel or. Section 19 goes 
further, and shows that sequential systems, such as the Combinator Calculus, can- 


not stepwise simulate the parallel or. 


17.1. Term Reduction Systems 
Term reduction systems, also called subtree replacement systems [O’D77], and 
term rewriting systems [HL79], are a formal generalization of the sort of reduction 


system that can be defined by sets of equations. 


Definition 17.1.1 


Let © be a ranked alphabet, with p(a) the rank of a for a€2. 
A 2-tree is a tree with nodes labelled from 2. 


Zy=lal a is a Z—tree & every node in a labelled by a has exactly p(a) sons}. 


0 


If 2 is the set of symbols described in the Symbols section of an equational 
definition, then Zy is the set of well-formed expressions that may be built from 
those symbols. 

The following definition is based on Rosen’s notion of rule schemata [Ro73] 


(Def. 6.1, p.175). 
Definition 17.1.2 


Let 2 be ranked alphabet, and let V be an ordered set of nullary symbols not in = 


to be used as formal variables. A rule schema is a pair <a,6>, written a—, 
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such that: 


(1) a,B€(ZUV) y 

(2) af V 

(3) Every variable in @ is also in a. 
A tule schema a—B is left linear if, in addition, no variable occurs more than once 
in a. 
Formal variables of rule schemata will be written as fat lower case letters, such as 


X,Y,z. 


Assume that that the variables in a left linear rule schema a —8 are always chosen 


so that x;,°* + ,x,, occur in @ in order from left to right. 


If a6 is a rule schema over 2, with variables x,,°+- x,€V, and y),°°* .y,€Zy, 
then aly;/x1, °° * ¥_/X_,] ~Bly,/x1,* + * 5Y_/X_] is an instance of a—B. 

Let S be a finite set of rule schemata over 2. The system <Zy,—g> is a term 
reduction system, where —g is the least relation satisfying 

a—c8 if a—6 is an instance of a schema in S; 

a—B => yla/x]—gy[8/x] where y€(ZUV)y contains precisely one occurrence 
of the variable x €V, and no other variables. 

The subscript S is omitted whenever it is clear from context. 

0 

Every set of S equations following the context-free syntax of the equation inter- 
preter defines a term reduction system, in which —g represents one step of compu- 
tation. The restrictions on equations from Section 5 induce a special subclass of 


term reduction systems. 
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Definition 17.1.3 


Let S be a set of rule schemata. 
S is nonoverlapping if for all pairs of (not necessarily different) schemata a, 8, 
and a, —@, the following holds (without loss of generality assuming that the vari- 


ables are chosen in some standard order): 


Let a —B; and a By be instances of the schemata above, with 
ayma,ly, )/x1, ***.Y1m/%m]. Suppose that a,=6,la,/x], where 6 contains pre- 
cisely one occurrence of x, 5x. Then there is ani, 1<i<m, such that a=. 

Intuitively, S is nonoverlapping if rule schemata may only be overlaid at variable 


occurrences, and at the root. 


A set of rule schemata S is consistent if, for all pairs of schemata 
a; 8), a —B2€S, the following holds. If a—Bi, a—B, are instances of the sche- 


mata above, then 6,=f3. 


Intuitively, 8, and 8 are the same up to choice of variable names in the two sche- 


mata. 


If S is a nonoverlapping, consistent set of left-linear rule schemata, then 


<2Zy,—g> is a regular term reduction system. 


0 


This definition, adapted from [O’D77], is essentially the same as Huet and Lévy’s 
[HL79] and Klop’s [K180a] (Ch.II), but Klop allows bound variables in expres- 
sions, and instead of requiring consistency, Huet-Lévy and Klop outlaw overlap at 
the root entirely. The regular reduction systems are those defined by equations 


satisfying restrictions 1-4 from Section 5. Restriction 5 is a bit more subtle, and is 
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treated in Sections 17.2 and 17.3. 


17.2. Sequentiality 


In discussion of procedural programming languages, "sequential" normally means 
“constrained to operate sequentially." Reduction systems are almost never con- 
strained to operate sequentially, since any number of the redexes in an expression 
may be chosen for replacement. Rather, certain reduction systems are constrained 
to be interpreted in parallel. So, a sequential reduction system is one that is 


allowed to operate sequentially. For example, the equations for the conditional 
if(true, x, y) = x; if(false, x,y) = y 


allow for parallel evaluation of all three arguments to an if, but they also allow 
sequential evaluation of the first argument first, then whichever of the other two 
arguments is selected by the first one. The sequential evaluation described above 
has the advantage of doing only work that is required for the final answer -- the 
parallel strategy is almost certain to perform wasted work on the unselected argu- 
ment. By contrast, consider the parallel or, defined by 

or(true, x) = true; 


or(x, true) = true; 


or(false, false) = false 


The or operator seems to require parallel evaluation of its two arguments. If 
either one is selected as the first to be evaluated, it is possible that that evaluation 
will be infinite, and will prevent the unselected argument from evaluating to true. 
In the presence of the parallel or, there appears to be no way to avoid wasted 


evaluation work in all cases. 


It is not obvious how to formally distinguish the sequential reduction systems 
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from the parallel ones. Huet and Lévy [HL79] discovered a reasonable technical 
definition, based on a similar definition of sequential predicates by Kahn and Plot- 
kin [KP78]. The key idea in Huet’s and Lévy’s definition is to regard a redex as a 
not-yet-known portion of an expression. Certain portions of an expression, e.g., 7 
in if (true, B, y), need not be known in order to determine a normal form. Others, 
e.g. a in if (a, B, y), must be known. A sequential term reduction system is one in 
which we may always identify at least one unknown position (redex) that must be 
known to find a normal form. 

In order to discuss unknown portions of an expression, Huet and Lévy use the 


nullary symbol w to represent unknowns, > for the relation "more defined than." 


Definition 17.2.1 [HL79] 

Let a,8€(ZUVU {w}) y. 

a> if 8 is obtained from @ by replacing zero or more subexpressions in a with w. 
Let S be a set of rule schemata. a€(ZU {w})y is a partial redex if there is a rule 
schema a’—*8 € & such that a'2>a. 

a is a definite potential redex if there is a rule schema B—y € 2 and a term a’, 
such that a'>qa and a’ —*8. 

a is root stable if a is not a definite potential redex. 


A total normal form for (ZU {w}) yz is a normal form in Dy. 


Let a€ (ZU (w})4, and let @ have no total normal form. 
An index of a is a term a'€ (ZU {w,x})y with precisely one occurrence of x, such 
that a=a'[w/x], and 


V 62a (6 has a total normal form) => yw B=a'ly/x). 


A regular reduction system is sequential if every term containing w with no total 
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normal form has at least one index. 


0 


Unfortunately, sequentiality is not a decidable predicate, and even indexes of 
sequential systems are not always computable [HL79]. So, Huet and Lévy define a 
stronger property called strong sequentiality. Essentially, a reduction system is 
strongly sequential if its sequentiality does not depend on the right-hand sides of 
rules. 

Definition 17.2.2 [HL79] 

a—,8 if 8 is the result of replacing some redex in @ with an arbitrarily chosen 
term in Zy. Let w€(ZU {w})y, and let there be no total normal form 6 such that 
a—,,*5. An index a’ of a is a strong index of a if 

V 62a B—,*6 & (6 is a total normal form) => yw B2a'ly/x]. 

A set of rule schemata is strongly sequential if every term @ containing w such that 
a is in (nontotal) normal form, but for no 6 in total normal form does a —,,*5, con- 
tains at least one strong index. 

a is a potential redex if there is a rule schema B—y € 2 and a term a’, such that 
a'2a and a’ —,,*B. 

a is strongly root stable if a is not a potential redex. 


0 


Huet and Levy give a decision procedure to detect strongly sequential systems, 
and an algorithm to find a strong index when one exists, or report strong stability 
when there is no index. These are precisely the sorts of algorithms needed by the 
equation interpreter. The sequentiality detection is used in a preprocessor to detect 
and reject nonsequential equations. The other algorithms are used to determine a 


traversal order of an input expression, detecting redexes when they are met. When 
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an expression is found to be strongly root stable, the root may be output immedi- 
ately, and the remaining subexpressions evaluated in any convenient order. Unfor- 
tunately, Huet’s and Levy's algorithms are complex enough that it is not clear 
whether they have acceptable implementations for the equation interpreter. So, the 
current version of the equation interpreter applies the substantially stronger restric- 
tion of strong left-sequentiality to guarantee a particularly simple algorithm for 
choosing the sequential order of evaluation. 

In Section 19, in trying to characterize the nondeterministic computational 
power of equationally defined languages, we treat another simplified notion of 
sequentiality. Simple strong sequentiality is defined to allow choice of a sequen- 
tial order of evaluation by a process with no memory, simply on the basis of order- 


ing the arguments to each function symbol independently. 


Definition 17.2.3 

Let S be a strongly sequential set of rule schemata. S is simply strongly sequential 
if there is a sequencing function s:(ZU {w})y (ZU (w,x})y such that, for all par- 
tial redexes a and 8, containing at least one w each, s(a) is a strong index in a, 


and s (a)[s (8)/x] is a strong index in s (a)[8/x]. 


0 
A system with left-hand sides f(a,a), g(f(b,x)), h(f(x,b)) is strongly sequen- 
tial, but not simply strongly sequential, because the sequencing function cannot 


choose an index in f (w,w) without knowing whether there might be a g or h above. 


17.3. Left-Sequentiality 


The strongly left—sequential systems of equations, defined in this section, were 


designed to support an especially simple pattern-matching and sequencing algo- 
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rithm in the equation interpreter. Section 18.2.3 presents that algorithm, and 


shows that it succeeds on precisely the strongly left-sequential systems. 


Definition 17.3.1 

A Z—context is a term in (ZU {w}) z. 

An instance of a context a is any term or context @ resulting from the replacement 
of one or more occurrences of w in a, i.e., B2a. 

A left context is a context a such that there is a path from the root of @ to a leaf, 
with no occurrences of w on or to the left of the path, and nothing but ws to the 
right of the path. 

A left—traversal context is a pair <a,l>, where a is a left context, and / is a 
node on the path dividing ws from other symbols in a. 

An index a’ of a is a strong root index of a if 

V Bra B—,*65 & Sis root stable => Fyxw B>a'ly/x] 

A redex @ in a term a is essential if a=a'[B/x] where a’ is a strong index of a. 

A redex 8 in a term a@ is root—essential if a=a'[B/x] where a' is a strong root 


index of a. 
0 


A context represents the information known about a term after a partial traversal. 
The symbol w stands for an unknown portion. A left-traversal context contains 
exactly the part of a term that has been seen by a depth-first left traversal that has 
progressed to the specified node. —., from Definitions 17.2.2 is the best approxi- 
mation to reduction that may be derived without knowing the right-hand sides of 
equations. A strong root index is an unknown portion of a term that must be at 
least partially evaluated in order to produce a strongly root stable term. Since —,, 


allows a redex to reduce to anything, a strong root index must be partially 
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evaluated to produce a redex, which may always be w-reduced to a normal form. 


In the process of reducing a term by outermost reductions, our short-term goal 
is to make the whole term into a redex. If that is impossible, then the term is root 


stable, and may be cut down into independent subproblems by removing the root. 


Definition 17.3.2 

A set of equations is strongly left—sequential if there is a set of left-traversal con- 
texts L such that the following conditions hold: 

1. For all <a,/> in L, the subtree of a rooted at / is a redex. 

2. For all <a,/> in L, 6 an instance of a, / is essential to B. 

3. For all left-traversal contexts <a,/> not in L, 8 an instance of a, a root- 
essential redex of 8 does not occur at /. 


4. Every term is either root stable or an instance of a left context in L. 


0 


In a strongly left-sequential system, we may reduce a term by traversing it in 
preorder to the left. Whenever a redex is reached, the left-traversal context speci- 
fying that redex is checked for membership in L. If the left context is in L, the 
redex is reduced. Otherwise, the traversal continues. When no left context in L is 
found, the term must be root stable, so the root may be removed, and the resulting 
subterms processed independently. (1) and (2) guarantee that only essential 
redexes are reduced. (3) guarantees that no root-essential redex is skipped. (4) 
guarantees that the reduction never hits a dead end by failing to choose any redex. 
The analogous property to strong left-sequentiality, using reduction instead of w- 
reduction, is undecidable. Notice that strong left-sequentiality, like strong sequen- 
tiality, depends only on the left-hand sides of equations, not on the right-hand 


sides. 
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Strongly left-sequential sets of equations are intended to include all of those 
systems that one might reasonably expect to process by scanning from left to right. 
Notice that Definition 17.3.2 does not explicitly require L to be decidable. Also, a 
strongly left-sequential system may not necessarily be processed by leftmost- 
outermost evaluation. Rather than requiring us to reduce a leftmost redex, 
Definition 17.3.2 merely requires us to decide whether or not to reduce a redex in 
the left part of a term, before looking to the right. Every redex that is reduced 
must be essential to finding a normal form. When the procedure decides not to 
reduce a particular redex, it is only allowed to reconsider that choice after produc- 
ing a root-stable term and breaking the problem into smaller pieces. Section 18.2.3 
shows a simple algorithm for detecting and processing strongly left-sequential sys- 
tems. While strongly left-sequential systems are defined to allow a full depth-first 
traversal of the term being reduced, the algorithm of Section 18.2.3 avoids search- 
ing to the full depth of the term in many cases by recognizing that certain sub- 


terms are irrelevant to choosing the next step. 


18. Crucial Algorithms and Data Structures for Processing 
Equations 
Design of the implementation of the equation interpreter falls naturally into four 


algorithm and data structure problems. 

1. Choose an efficient representation for expressions. 

2. Invent a pattern-matching algorithm to detect redexes. 

3. Invent an algorithm for selecting the next redex to reduce. 
4. Choose an algorithm for performing the selected reduction. 


The four subsections of this section correspond roughly to the four problems above. 
For sequential equations, the choice of the next redex to reduce is intimately tan- 
gled with pattern matching, so the sequencing problem is treated in Section 18.2, 
along with pattern matching. For inherently parallel equations, an additional 
mechanism is required to manage the interleaved sequence of reductions. This 
mechanism has not been worked out in detail, but what we know of it is described 
in Section 18.3. The current version of the equation interpreter does the most obvi- 
ous procedure for performing reductions. Section 18.4 mentions some future 


optimizations to be tested for that problem. 


18.1. Representing Expressions 


Because reduction to normal form involves repeated structural changes to an 
expression, some sort of representation of tree structure by pointers seems inescapa- 
ble. There are still a number of options to consider, and we are only beginning to 
get a feeling for which is best. The most obvious representation of expressions is 
similar to the one used in most implementations of LISP to represent S-expressions. 


Each function and constant symbol occupies one field of a storage record, and the 
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other fields contain pointers to the representations of the arguments to that func- 
tion. It is quite easy in this representation to avoid multiple copies of common 
subexpressions, by allowing several pointers to coalesce at a single node. Thus, 
although the abstract objects being represented are trees, the representations are 


really directed acyclic graphs. Figures 18.1.1 and 18.1.2 show the expression 
h(f(g(a,b), g(a,b)), g(a,b)) 


with and without sharing of the subexpression g(a,b). 


Figure 18.1.1 


Unlike LISP, whose nodes are all either nullary or binary (0 or 2 pointers), 
the equation interpreter requires as many pointers in a node as the arity of the 
function symbol involved. The issue of how best to handle the variety of arities has 
not been addressed. Presumably, some small maximum number of pointers per 


physical node should be chosen, perhaps as small as 2, and virtual nodes of higher 
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Figure 18.1.2 


arities should be linked together from the smaller physical nodes, although 
varying-size physical nodes are conceivable as a solution. The best choice of physi- 
cal node size, as well as the structure for linking them together (probably a linear 
list, but possibly a balanced tree) can best be chosen after experience with the 
interpreter yields some statistics on its use of storage. Currently, we simply allo- 
cate a node size sufficient to handle the highest arity occurring in a set of equa- 
tions. This is certainly not the right solution in the long run, but since storage lim- 


its have not been a limiting factor in our experimental stage, it was a good way to 


get started. 


Depending on the pattern-matching and sequencing algorithms chosen, it may 
be necessary to keep pointers from sons to fathers in the representation of an 


expression, as well as the usual father-son pointers. The most obvious solution -- 
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one back pointer per node -- does not work, because the sharing of subexpressions 
allows a node to have arbitrarily many fathers. Some sort of linked list of fathers 
seems unavoidable. An elegant representation avoids the use of any extra nodes to 
link together the fathers. Both the son-father and father-son links are subsumed by 
a circular list of a son and all of its fathers. When a father appears on this list, it 
is linked to the next node through a son pointer, corresponding to the argument 
position held by the unique son on that list with respect to that particular father. 
The unique son on the list is linked through a single extra father pointer. Figure 


18.1.3 shows the expression 
AG (g (a,b), 2 (a,b)),g¢ (a,b)) 


again, with the g(a,b) subexpressions shared, represented with circular father-son 
lists. Since each node may participate in a number of circular lists equal to its 
arity, the pointers must actually point to a component of a node, not to the node as 
a whole. Notice that, in order to get from a father to its ith son, it is necessary to 
follow the circular list going through the father’s ith son pointer until that list hits 
the component corresponding to a father pointer. The node containing the unique 
father pointer in the circular list is the unique son on that list. The possibilities for 
breaking a virtual node of high arity into a linked structure of smaller physical 
nodes are essentially the same as with the LISP-style representation above, except 
that the linkage within a virtual node must also be two-way or circular. The 
pattern-matching algorithm used in the current version of the equation interpreter 
does not require back pointers, so the simpler representation is used. An earlier 
version used back pointers, and we have not had sufficient experience with the 
interpreter to rule out the possibility of returning to that representation in a later 


version. 
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Figure 18.1.3 


18.2. Pattern Matching and Sequencing 


The design of the equation interpreter was based on the assumption that the time 
overhead of pattern matching would be critical in determining the usability of the 
interpreter. Each visit to a node in the equation interpreter corresponds roughly to 
one recursive call or return in LISP, so the pattern-matching cost per node visit 
must compete with manipulation of the recursion stack in LISP. So, we put a lot 
of effort into clever preprocessing that would allow a run-time pattern-matching 
cost of a few machine instructions per node visit. By contrast, most implementa- 
tions of Prolog use a crude pattern-matching technique based on sequential search, 
and depend on the patterns involved being simple enough that this search will be 
acceptably quick. An indexing on the leftmost symbol of a Prolog clause limits the 


search to those clauses defining a single predicate, but even that set may in princi- 
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ple be quite large. Prolog implementations were designed on the assumption that 
unification [Ko79b] (instantiation of variables) is the critical determinant of perfor- 
mance. Only an utterly trivial sort of unification is used in the equation inter- 


preter, so our success does not depend on that problem. 


We do not have sufficient experience yet to be sure that the pains expended on 
pattern matching will pay off, but if equational programming succeeds in providing 
a substantially new intuitive flavor of programming, extremely efficient pattern 
matching is likely to be essential. Pattern matching based on sequential search 
allows the cost of running a single step of a program to grow proportionally to the 
number of equations and complexity of the left-hand sides. This growth 
discourages the use of many equations with substantial left-hand sides. Equational 
programs with only a few equations, with simple left-hand sides, tend to be merely 
syntactically sugared LISP programs, and therefore not worthy of a new implemen- 


tation effort when so many good LISP processors are already available. 


All of our pattern-matching algorithms are based on the elegant string- 
matching algorithm using finite automata by Knuth, Morris, and Pratt [KMP77], 
and its extension to multiple strings by Aho and Corasick [AC75]. The essential 
idea is that a set of pattern strings may be translated into a finite automaton, with 
certain states corresponding to each of the patterns. When that automaton is run 
on a subject string, it enters the accepting state corresponding to a given pattern 
exactly whenever it reaches the right end of an instance of that pattern in the sub- 
ject. Furthermore, the number of states in the automaton is at most the sum of the 
lengths of the patterns, and the time required to build it is linear (with a very low 
practical overhead) in the lengths as well. In particular, the automaton always 


contains a tree structure, called a trie, in which each leaf corresponds to a pattern, 
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and each internal node corresponds to a prefix of one or more patterns. As long as 
the subject is matching some pattern or patterns, the computation of the automaton 
goes from the root toward the leaves of that tree. The clever part of the algorithm 
involves the construction of backward, or failure, transitions for those cases where 
the next symbol in the subject fails to extend the pattern prefix matched so far. A 
finite automaton, represented by transition tables, is precisely the right sort of 
structure for fast pattern matching, since the processing of each symbol in the sub- 
ject requires merely a memory access to that table, and a comparison to determine 


whether there is a match. 


Tree pattern matching is different enough from the string case that explora- 
tion of several extensions of the Aho-Corasick technique to trees consumed sub- 
stantial effort in the early stages of the interpreter project. Those efforts are 
described more fully in [HO82a, HO84], and are merely summarized here. In addi- 
tion to the problems arising from working with trees instead of strings, we must 
face the problem of incremental execution of the pattern matcher after each reduc- 
tion step. It is clearly unacceptable to reprocess the entire expression that is being 
reduced after each local reduction step, so our pattern-matching algorithm must be 
able to pick up some stored context and rescan only the portion affected by each 


change. 


A decisive factor in the final choice of a pattern matcher turned out to be its 
integration with the sequencer. Although not so decisive in the design of the 
current implementation, the added complication of dealing with disjunctive where 
clauses restricting the substitutions for variables may ruin an algorithm that runs 


well on patterns with unrestricted variable substitutions. 
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18.2.1. Bottom-Up Pattern Matching 


Three basic approaches were found, each allowing for several variations. The first, 
and perhaps most obvious is called the bottom—up method, because information 
flows only from the leaves of a tree toward the root. Each leaf is assigned a match- 
ing state corresponding to the constant symbol appearing there. Each function 
‘symbol has a table, whose dimension is equal to the arity of the symbol, and that 
table determines the matching state attached to each node bearing that function 
symbol, on the basis of the states attached to its sons. Every matching state may 
be conceived as representing a matching set of subtrees of the given patterns that 
can all match simultaneously at a single subtree of a subject. In particular, certain 
states represent matches of complete patterns. In the special case where every 
function symbol has arity 1, the bottom-up tables are just the state-transition tables 
for the Aho-Corasick string matching automaton [HO82a]. 

Example 18.2.1.1 

Consider the pattern f (g (hk (x), h(a)), hGy)), with variables x and y. The match- 
ing sets and states associated with this pattern are: 

1: {x,y} 

2: {x,y, a) 

3: (x,y, A(x), hy} 

4: {x,y, A(x), AG), h(@} 

5: {x,y, gh), hy), h(a))} 

6: {x,y, f(g(h(x), h(a)), h(y))} 


Set 6 indicates a match of the entire pattern. Assuming that there is one more 


nullary symbol, b, the symbols correspond to tables as follows: 


18.2.1. Bottom-Up Pattern Matching 195 


1} 1112111 
2) 111111 
3} 111111 
4,1 11111 
5|116611 
6,111111 


1} 111111 
2; 1111211 
3}111511 
44111511 
S}1112111 
6,1 11111 


Figure 18.2.1.1 shows the matching states assigned to all nodes in the tree 


representing the expression 


AGES GAG), h(a)), h(b)), f (eh), h(O)), h(a). 
0 


The bottom-up method is ideal at run time. An initial pass over the input 
expression sets all of the match states, which are stored at each node. After a 
reduction, the newly created right-hand side symbols must have their match states 
computed, plus a certain number of nodes above the point of the reduction may 
have their match states changed. The length of propagation of changes to states is 


at most the depth of the deepest left-hand side, and is usually shorter than that. 
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Figure 18.2.1.1 
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Because of sharing, there may be arbitrarily many paths toward the root along 
which changes must be propagated. The more complex representation of expres- 
sions allowing traversal from sons to fathers as well as fathers to sons must be used 


with the bottom-up method. 


Unfortunately, the bottom-up method gets into severe trouble with the size of 
the tables, and therefore with the preprocessing time required to create those tables 


as well. There are two sources of explosion: 


1. A symbol of arity m requires an n-dimensional table of state transitions. 
Thus, the size of one symbol’s contribution to the tables is s", where s is the 


number of states. 


2. In the worst case, the number of the states is nearly 2?, where p is the sum of 


the sizes of all the patterns (left-hand sides of equations). 


The exponential explosion in number of states dominates the theoretical worst case, 
but never occurred in the two years during which the bottom-up method was used. 
In [HO82a] there is an analysis of the particular properties of patterns that lead to 
the exponential blowup. Essentially, it requires subpatterns that are incomparable 
in their matching power -- subjects may match either one, both, or neither. For 
example, f(x,a) and f(a,x) are incomparable in this sense, because f (b,a) 
matches the first, f (a,b) matches the second, f(a,a) matches both, and f (b,b) 
matches neither. Such combinations of patterns are quite unusual in equational 


programs. 


The theoretically more modest increase in table size due to the arity of sym- 
bols had a tremendous impact in practice. By suitable encodings, such as Currying 
(see Section 4.4), the maximum arity may be reduced to 2. In principle, reducing 


the arity could introduce an exponential explosion in the number of states, but this 
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never happened in practice. Unfortunately, for moderate sized equational pro- 
grams, such as those used in the syntactic front end of the equation interpreter 
itself, even quadratic size of tables is too much, leading to hour-long preprocessing. 
Tables may be compressed, by regarding them as trees with a level of branching 
for each dimension, and sharing equivalent subtrees. The resulting compression is 
‘quite substantial in practice, giving the appearance of a linear dependence with 
constant factor around five or ten. The time required to generate the large tables 
and then compress them is still unacceptable. For a short time, we used a program 
that produced the compressed tables directly, but it required a substantial amount 
of searching for identical subtrees, and the code was so complicated as to be 
untrustworthy. Cheng, Omdahl, and Strawn (Iowa State University) have made 
an extensive experimental study of several techniques for improving the bottom-up 
tables for a particular set of patterns arising from APL idioms. Their work sug- 
gests that a carefully hand-tuned bottom-up approach may be good for static sets 
of patterns, but substantial improvement in the algorithm is needed to make it a 
good choice when ever-changing pattern sets require completely automatic prepro- 
cessing. 

The problems with table size alone may well have killed the bottom-up 
method, but more trouble arose when we considered the sequencing problem. The 
only bottom-up method for sequencing that we found was to keep track simultane- 
ously of the set of subpatterns matched by a node (the match ser), and the set of 
subpatterns that might come to match as a result of reductions at descendants of 
the node (the possibility set). The reduction strategy would be driven by an 
attempt to narrow the difference between the match set and the possibility set at a 


node. When no redex appears in the possibility set at a node, that node is stable, 
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and may be output. A precise definition of possibility sets appears in the appendix 
to [HO79], and may also be derived from [HL79]. Possibility sets, just as match 
sets, may be enumerated once during preprocessing, then encoded into numerical 
state names. Possibility sets may explode exponentially, even in cases where the 
match sets do not. Possibility sets were never implemented for the equation inter- 
preter, and the bottom-up versions actually performed a complete reduction 
sequence, by adding each redex to a queue as it was discovered, then reducing the 
one at the head of the queue - a very wasteful procedure. Bottom-up pattern 
matching in the equation interpreter will probably not be resurrected until a 
simpler sequencing technique is found for it. 

A final good quality of the bottom-up method, although not good enough to 
save it, is its behavior with respect to ors in where clauses. A left-hand side of an 


equation in the form 
E =F where x is either G, or --+ or G, end or end where 


may be treated as a generalized pattern of the form Elor(G,,-::,G,)/x] (the 
special expression or (G,,°-~-,G,) is substituted for each occurrence of x). A pat- 
tern of the form or(G,,---,G,) matches an expression when any one of 
G,,°**,G, matches. The subpatterns involved in these ors have no special impact 


on the number of states for bottom-up matching. 


18.2.2. Top-Down Pattern Matching 


The second approach to pattern-matching that was used in the equation interpreter 
is the top—down approach. Every path from root to leaf in a pattern is taken as a 
separate string. Numerical labels indicating which branch is taken at each node 


are included, as well as the symbols at the nodes. Variable symbols, and the 
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branches leading to them, are omitted. If one string is a prefix of another one 
associated with the same pattern, it may be omitted. 

Example 18.2.2.1 

The expression f (g(a, x), h(y, b)), whose tree form is shown in Figure 18.2.2.1, 
produces the strings flgla, flg, f2hla, f2h2b. The string f1g may be omitted, 


“because it is a prefix of flgla. 


Figure 18.2.2.1 


0 


The Aho-Corasick algorithm may be applied directly to the set of strings derived in 
this way. Then, in a traversal of a subject tree, we may easily detect the leaf-ends 
of all subject paths that match a pattern path. By keeping states of the automaton 
on the traversal stack, we may avoid restarting the automaton from the root for 


each different path. 


The hard part of the top-down method is correlating all the individual path 
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matches to discover complete matches of patterns. We tried two ways of doing 
this. In the first, a counter for each pattern is associated with each node on the 
recursion stack. Each time a path is matched, the appropriate counter at the root 
end is incremented. If the traversal stack is stored as an array, rather than a 
linked list, the root end may be found in constant time by using a displacement in 
the stack. Whenever a counter gets up to the number of leaves in the appropriate 
pattern, a match is reported. In the worst case, every counter could go up nearly 
to the number of leaves in the pattern, leading to a quadratic running time. In a 
short experiment with this method, carried out by Christoph Hoffmann’s advanced 
compiler class in 1982, such a case was not observed. [HO82a] analyzes the quali- 
ties of patterns that lead to good and bad behavior for the top-down method with 


counters, but two other problems led to its being abandoned. 


After a change in the subject tree, resulting from a reduction step, the region 
above the change must be retraversed, and the counters corrected. This requires 
that the old states associated with the region be saved for comparison with the new 
ones, since only a change from acceptance to rejection, or vice versa, results in a 
change to a counter. This retraversal was found to be quite clumsy when it was 
tried. Also, no sequencing method for top-down pattern matching with counters 


was ever discovered. 


Another variation on top-down pattern matching, using bit strings to correlate 
the path matches, led to much greater success. Each time a path matches, a bit 
string is created with one bit for each /Jevel of the pattern. All bits are set to 0, 
except for a 1 at the level of the leaf just matched. These bit strings are combined 
in a bottom-up fashion, shifting as they go up the tree, and intersecting all the bit 


strings at sons of a node to get the bit string for that node. When a 1 appears at 
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the root level, a match is detected. The details are given in [HO82a]. 


Example 18.2.2.2 
Consider again the pattern f(g (h(x), A(a)), A(y)) from Example 18.2.1.1, run- 


ning on the expression 
ASS (gh (a), h(a)), h(b)), f(g (A (5), h(6)), h(a))). 


Figure 18.2.2.2 shows the tree representation of this expression. A * is placed at 
the leaf end of each path matching a root-to-leaf path in the pattern. Each node is 
annotated with the bit string that would be computed for it in the bottom-up por- 


tion of the matching. 
0 


Careful programming yields an algorithm in which the top-down and bottom-up 
activities are combined in a single traversal, and bit strings, as well as automaton 
States, are only stored on the traversal stack, not at all nodes of the tree. Multiple 
patterns are handled conceptually by multiple bit strings. It is easy, however, to 
pack all of the strings together, and perform one extra bitwise and with every shift 


to prevent bits shifting between the logical bit strings. 


The top-down method with bit strings adapts well to incremental matching. 
After a change in the tree, merely reenter the changed portion from its father. The 
traversal stack still contains the state associated with the father, and the original 
processing of the changed portion has not affected anything above it, so there is no 
special retraversal comparable to the bottom-up method, or the top-down method 
with counters. Sequencing is handled by one more bit string, with one bit for each 
level of the pattern. In this string, a bit is 1 if there is a possible match at that 


level. The two bit strings used by the top-down method, in fact, correspond to the 
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Figure 18.2.2.2 
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match sets and possibility sets of the bottom-up method. Since the top-down pro- 
cessing has already determined which root-to-leaf paths of a pattern are candidates 
for matching at a given point, the match and possibility sets need only deal with 
subpattern positions along a single path. Thus, instead of precomputing sets of 
subpatterns, then numbering them, we may store them explicitly as bit strings, 
“using simple shifts and bitwise ands and ors to combine them. Such a technique 
did not work for the bottom-up method, because the shifts would become compli- 


cated shuffle operations, depending on the tree structure of the patterns. 


The top-down pattern matching techniques do not perform as well as the 
bottom-up method with respect to disjunctions in where clauses restricting the sub- 
Stitutions for variables in equations. No better strategy has been found than to 
treat each disjunctive pattern as a notation for its several complete alternatives. 
For example, 

F(x, y) = g(x) where x is either h(u) or a end or, 


y is either h() or b end or 
end where 


is equivalent to the four equations 
Sth(w), hW)) = gth(); 
Sh), b) = gth(w); 
fla, hO) = gla); 
S(a, b) = g(a) 


Notice that the effect is multiplicative, so a combinatorial explosion may result 
from several disjunctive where clauses in the same equation. Such an explosion 
only occurred once in our experience with the interpreter, but when it did, it was 


disastrous. At the end of Section 18.2.3 we discuss briefly the prospects for avoid- 
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ing the disjunctive explosion without returning to bottom-up pattern matching, with 


its own explosive problems. 


18.2.3. Flattened Pattern Matching 


The final pattern-matching method, used in the current version of the interpreter, 
uses only one string per tree pattern. Tree patterns are flattened into preorder 
strings, omitting variables. The Aho-Corasick algorithm [AC75] is used to produce 
a finite automaton recognizing those strings. Each state in the automaton is anno- 
tated with a description of the tree moves needed to get to the next symbol in the 
string, or the pattern that is matched, if the end of the string has been reached. 
Such descriptions need only give the number of edges (20) to travel upwards 
toward the root, and the left-right number of the edge to follow downwards. For 
example, the patterns (equation left-hand sides) f(f(a,x),g(a,y)) and g(x,5) 


generate the strings ffaga and gb, and the automaton given in Figure 18.2.3.1. 


The automaton cannot be annotated consistently if conflicting moves are asso- 
ciated with the same state. Such conflicts occur precisely when there exist preorder 
flattened strings of the forms afy and 86, such that the annotations on the last 
symbol of 6 in the two strings are different. These differences are discovered 
directly by attempts to reassign state annotations in the automaton when a is the 
empty string, and by comparing states at opposite ends of failure edges when a is 
not empty. Fortunately, the cases in which the flattened pattern-matching automa- 
ton cannot be annotated properly are exactly the cases in which equations violate 
the restrictions of Section 5. When y and 6 are not empty, the conflicting annota- 
tions are both tree moves, and indicate a violation of the left-sequentiality restric- 
tion 5. When one of y,6 is the empty string, the corresponding annotation reports a 


match, and there is a violation of restriction 3 or 4. In the example above, there is 
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———-> forward edge 

——-—-» failure edge 

failure edges not shown all lead to the start state 
fu means move up one level in the tree 

d1 means move down to son number 14 

m1 means a match of pattern number 1 


Figure 18.2.3.1 


a conflict with e=ffa, B=g, y=a, d=b. That is, after scanning ffag, the first pat- 
tern directs the traversal down edge number 1, and the second pattern directs the 
traversal down edge number 2. This conflict is discovered because there is a failure 


edge between states with those two annotations. 


The restriction imposed on equations by the pattern-matching strategy above 


may be justified in a fashion similar to the justification of deterministic parsing 
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strategies. That is, we show that the algorithm succeeds (generates no conflicts) 
on every set of equations that is strongly left-sequential according to the abstract 


definition of strong left-sequentiality in Section 17.3. 


Theorem 18.2.3.1 

The flattened pattern-matching algorithm succeeds (i.e., generates no conflicts) if 
and only if the input patterns are left-hand sides of a regular and strongly left- 
sequential set of equations. 

Proof sketch: 

(=>) If the pattern matching-automaton is built with no conflicts, then L of 
Definition 17.3.2 may be defined to be the set of all left-traversal contexts <x,/> 
such that / is the root of a redex in x, and / is visited by the automaton, when 


Started at the root of x. 


(<=) If a conflict is found in the pattern-matching automaton, then there are two 
flattened preorder strings aBy and B6 derived from the patterns, with conflicting 
tree moves from 8 to y and from B to 6. Without loss of generality, assume that 
there are no such conflicts within the two occurrences of 8. af, with its associated 
tree moves, defines a context x, which is the smallest left context allowing the 
traversal specified by af. @ defines a smaller left-traversal context p in the same 
way. p is contained as a subterm in x, in such a way that the last nodes visited in 
the two traversals coincide. If one or both of y, 6 is empty, then x demonstrates a 
violation of restriction (4) or (3), respectively. So, assume that -y, 6 are not empty, 


and the annotations at the ends of the fs are both tree moves. 
Consider the two positions to the right of x specified by the two conflicting traver- 


sal directions for af and 8. Expand x to p by filling in the leftmost of these two 


positions with an arbitrary redex, and let m be the root of this added redex. Let 
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equ, be the equation associated with whichever of a®y, 86 directed traversal 
toward this leftmost position, and let equ. be the equation associated with the 
remaining one of afy, 85. <,n> cannot be chosen in L, because there is an 
instance y' of y in which a redex occurs above n matching the left-hand side of 
equ», and y' may be w-reduced to normal form at this redex, without reducing the 
yedex at n in y. <y,n> cannot be omitted from L, because there is another 
instance y" of y in which everything but m matches the redex associated with equ, 


and n is therefore root-essential to py". 


For example, the pair of equation left-hand sides f (g(x,a),y) and g(b,c) have the 
preorder strings fga and gbc. A conflict exists with a=f, B=g, y=a,d=c. The 
first equation directs the traversal down edge 2 after seeing fg, and the second 
equation directs it down edge 1. The conflicting prefixes fg and g produce the con- 
text f(g (w,w),w). The context above is expanded to the left-traversal context con- 
sisting of f (g(g(b,c),w),w) with the root of g(b,c) specified. This left-traversal 
context cannot be chosen in L (ice., it is not safe to reduce the redex g(b,c) in this 
case), because the leftmost w could be filled in with a to produce 
S (g(g(b,c),a),w), which is a redex of the form f(g(x,a),y), and can be w- 
reduced to normal form in one step, ignoring the smaller redex g(b,c). But, this 
left-traversal context may not be omitted from L (i.e., it is not safe to omit reduc- 
ing g(b,c)), because the leftmost w may also be filled in with c to produce 


S (g(g(b,c),c),w), and reduction of g(b,c) is essential to get a normal form or a 


root-stable term. 


The flattened pattern-matching method is so simple, and so well suited to the 


precise restrictions imposed for other reasons, that it was chosen as the standard 
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pattern matcher for the equation interpreter. In fact, this method developed from 
an easy way to check the restrictions on equations, which was originally intended to 
be added to the top-down method. As long as the interpreter runs on a conventional 
sequential computer, the flattened pattern-matching method will probably be used 
even when the interpreter is expanded to handle nonsequential sets of equations. In 
that case, conflicting annotations will translate into fork operations. By exploiting 
sequentiality insofar as it holds, the overhead of keeping track of multiple processes 
will be minimized. An interesting unsolved problem is to detect and exploit more 
general forms of sequentiality than left-sequentiality in an efficient manner. There 


are likely to be NP-hard problems involved. 


The flattened method has the same shortcoming as the top-down method with 
respect to disjunctive where clauses. The current version merely treats a qualified 
pattern as if it were several simple patterns, causing an exponential explosion in the 
worst case. Brief reflection shows that the disjunctive patterns are, in effect, 
specifications of nondeterministic finite automata. The current version translates 
these into deterministic automata, preserving whatever portions are already deter- 
ministic in an obvious way. Usually the translated automaton is far from minimal. 
In most cases, there exists a deterministic automaton whose size is little or no 
greater than the nondeterministic one. A good heuristic for producing the minimal, 
or a nearly minimal, deterministic automaton directly, without explicitly creating 
the obvious larger version, would improve the practicality of the interpreter 
significantly. We emphasize the fact that this technique would probably only be 
heuristic, since the problem of minimizing an arbitrary nondeterministic finite 
automaton, or producing its minimal deterministic equivalent, provides a solution to 


the inequivalence problem for nondeterministic finite automata, which is 
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PSPACE-complete [GJ79]. 


We have discussed the incremental operation of each pattern matcher at 
matching time, as changes are made to the subject. As a user edits an equational 
program, particularly with the modular constructs of Section 14, it is desirable to 
perform incremental preprocessing as well, to avoid reprocessing large amounts of 
unchanged patterns because of small changes in their contexts. Robert Strandh is 
currently studying the use of suffix (or position) trees [St84] to allow efficient 
incremental preprocessing for the Aho-Corasick automaton, used in both the top- 
down and flattened pattern matchers. We have an incremental processor that 
allows insertion and deletion of single patterns for a cost linear in the size of the 
change, independent of the size of the automaton. We hope to develop a processor 
to combine two sets of patterns for a cost dependent only on their interactions 


through common substrings, and not on their total sizes. 


18.3. Selecting Reductions in Nonsequential Systems of Equations 


In principle, nonsequential equational programs, such as those including the paral- 
lel or, may be interpreted by forking off a new process each time the flattened 
pattern-matching automaton indicates more than one node to process next. The 
overhead of a general-purpose multiprocessing system is too great for this applica- 
tion. So, an implementation of an interpreter for nonsequential equations awaits a 
careful analysis of a highly efficient algorithm and data structure for managing 
these processes. Several interesting problems must be handled elegantly by this 
algorithm and data structure. First, whenever a redex @ is found, all processes 
seeking redexes within a must be killed, since the work that they do may not make 
sense in the reduced expression. This problem requires some structure keeping 


track of all of the processes in certain subtrees. Substantial savings may be 
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gained by having that structure ignore tree nodes where there is no branching of 


parallel processes. 


Because of the sharing of identical subexpressions, two processes may wander 
into the same region, requiring some locking mechanism to avoid conflicts. It is not 
sufficient to have a process merely lock the nodes that it is actually working on, 
since the result would be that the locked out process would follow the working pro- 
cess around, duplicating a lot of traversal effort. On the other hand, it is too much 
to lock a process out of the entire path from the root followed by another process, 
since the new process might be able to do a reduction above the intersection of 


paths, as a result of information found within the intersection. 


Example 18.3.1 
Consider the equations 
S(gth(x, y))) = x; 


or(true, x) = true; or(x, true) = true 


Suppose that we are reducing an intermediate expression of the form 
or (f (g(h (true ,a))), g(h (true,a))), with the common subexpression g(h (true ,a)) 
shared. At the or node, a process A will start evaluating the left-hand subexpres- 
sion f (g(h (true,x))), and another process B will start evaluating the right-hand 
subexpression g(h(true,a))). Perhaps process B reaches the common subexpres- 
sion g(h (true,a)) before A, and eventually transforms a into a’. If A waits at the 
J node, it will miss the fact that there is a redex to be reduced. A needs to go into 
the common subexpression just far enough to see the symbols g, h, and true. 
Then, f(g (h (true,c'))) will be replaced by true, yielding or (true, a’) which 
reduces to true. At this point, process B is killed. If A waits on B, it may wait 


arbitrarily long, or even forever. Figure 18.3.1 shows the expression discussed 


Se en PY 


Se 
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above in tree form, with the interesting positions of A and B. 


0 


So, a process A wandering onto the path of process B must continue until it 
reaches the same state as B. At that point, A must go to sleep, until B returns to 
the node where A is. These considerations require a mechanism for determining 
sina processes have visited a node, and in what states, and what processes are 


sleeping at a given node. 


Of course, a low-overhead sequencer for any set of equations could be devised, 
by adding each newly discovered redex to a queue, and always reducing the head 
redex in that queue. The resulting complete reduction sequence [O’D77] is very 
wasteful of reduction effort, and probably would not be acceptable except as a tem- 


porary experimental tool. 


18.4. Performing a Reduction Step 


At first, the implementation of reduction steps themselves seems quite elementary. 
Simply copy the right-hand side of the appropriate equation, replacing variable 
instances by pointers to the corresponding subexpressions from the left-hand side. 
Right-hand side variables may easily be replaced during preprocessing by the 
addresses of their instances on the left-hand side, so that no search is required. 
For example, (f(g(a,x),y) =h (x,y) may be represented by 
S (g(a, 7), ?) = h(<1,2>,<2>), indicating that the first argument to h is the 
2nd son of the Ist son of the redex node, and the second argument is the 2nd son 
of the redex node. Given these addresses, we no longer need the names of the vari- 
ables at all. The symbol h may be overwritten on the physical node that formerly 


contained the symbol f. One small, but crucial, problem arises even with this 
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Figure 18.3.1 
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simple approach. If there is an equation whose right-hand side consists of a single 
variable, such as car[(x . y)] = x, then there is no appropriate symbol to overwrite 
the root node of the redex (in this case containing the symbol car). Consider the 
expression f[car[(a . 6)]], for example, which reduces to fla]. We could modify 
the node associated with f, so that its son pointer goes directly to a, but then any 
other nodes sharing the subexpression car[(a . 8)] would not benefit by the reduc- 
tion. Finding and modifying ail of the fathers of car[(a@ . 8)], on the other hand, 
might be quite expensive. The solution chosen [O’D77] is to replace the car node 
by a special dummy node, with one son. Every time a pointer is followed to that 
dummy node, it may be redirected to the son of the dummy. Eventually, the 
dummy node may become disconnected and garbage collected for reuse. In effect, 
this solution requires a reduction-like step to be performed for every pointer to the 
redex car[(« . 6)], but that step is always a particularly simple one, independent of 


the complexity of the redex itself. 


As the left-hand sides of equations become more substantial, the collecting of 
pointers to variable instances, by the simple technique above, might become non- 
trivially costly. It appears that there are some simple variations in which pointers 
to variables would be collected during the pattern-matching traversal. A careful 
analysis is needed to determine whether the savings in retraversal costs would pay 
for the extra cost of maintaining pointers to variable positions in partially matching 
subexpressions, when the matching fails later on. In any case, the retraversal for 
variables could certainly be improved to collect all variables in a single retraversal, 
rather than going after each variable individually. Somewhat more sophisticated 
optimizations would include performing some of the pattern-matching work for 


right-hand sides of equations during the preprocessing, and coalescing several 
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reduction steps into one. Substantial savings might also be achieved by reusing 
some of the structure of a left-hand side in building the right. In particular the 
common form called tail recursion, in which a recursively defined function appears 
only outermost on a right-hand side, may be translated to an iteration by these 
means. For example, f(x) = if (p(x), a, f(g(x))) may be implemented by 
coalescing the evaluation of an application of f with the subsequent evaluation of 
if, and reusing the f node by replacing its argument, instead of the f node itself. 
The OBJ project [FGJM85] has achieved substantial speedups with optimizations 
of this general sort, but their applicability to the equation interpreter has never 


been studied. 


Another opportunity for improving performance arises when considering the 
strategy for sharing of subexpressions. The minimal sharing strategy should be to 
share all substitutions for a given variable x on the right-hand side of an equation. 
In fact, it is easier to program an interpreter with this form of sharing than one 
that copies out subterms for each instance of x. A natural improvement, not 
implemented yet, is to discover all shareable subterms of a right-hand side, during 
the preprocessing step. For example, in f(x, y) = g(h(x, y), A(x, y)), the entire 
subexpressions h(x, y) may be shared, as well as the substitutions for x and y. 
Such an optimization could be rather cheap at preprocessing time, using a variation 
of the tree isomorphism algorithm [AHU74], and need not add any overhead at 
run time. More sophisticated strategies involve dynamic detection of opportunities 
for sharing at run time, trading off overhead against the number of evaluation steps 


required. 


A simple sort of dynamic sharing is implemented in the current version of the 


interpreter. This technique is based on the hashed cons idea from LISP, and was 
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proposed and implemented for the equation interpreter by Paul Golick. Whenever 
a new storage node is created during reduction, its value (including the symbol and 
all pointers) is hashed, and any identical node that already exists is discovered. 
Upon discovery of an identical node, the new one is abandoned in favor of another 
pointer to the old one. In order to maximize the benefit of this strategy, reductions 
are performed, not by immediate replacement of a left-hand side by a right, but by 
setting a pointer at the root of a redex to point to its reduced form. As discovered 
during the traversal of an expression, pointers to redexes are replaced by pointers 
to the most reduced version created so far. This use of reduction pointers sub- 


sumes the dummy nodes discussed at the beginning of this section 


The hashed sharing strategy described above is rather cheap, and it allows 
programming techniques such as the automatic dynamic programming of Section 
15.4. The results of this strategy are unsatisfyingly sensitive, however, to the phy- 
sical structure of the reduction sequence, and the order of evaluation. 

Example 18.4.1 
Consider the equations 
Sa) = b; 


c™ a. 


In the expression g(f(c), f(a)), c reduces to a, yielding g(f(a), f(a)). We 
might expect the two instances of f(a) to be shared, but the hashed sharing stra- 
tegy will not accomplish this. Both nodes containing f existed in the original 
expression, and had different son pointers at that time, so that there could be no 
sharing. When c is replaced by a, the a occupies a new node, which is hashed, 
discovering the existing a node, and creating a sharing of that leaf. The change in 


the son pointer in the leftmost f node, however, does not cause that node to be 
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hashed again, so the fact that it may now be shared with another f node is not 
discovered. Thus, two more reduction steps are required to reach the normal form 


g(b, b), instead of the one that would suffice with sharing of f (c). 
0 


Only one reduction step is wasted by the failure of sharing in Example 18.4.1, but 
arbitrarily many reduction steps might be involved from f (a) to the normal form b 


in a more complex example, and all of those steps would be duplicated. 


In order to achieve maximal sharing, we need to rehash a node each time any 
of its sons changes. This requires a more sophisticated use of hashing tables, 
allowing deletions as well as insertions, and the details have never been worked out. 
Perhaps some technique other than hashing should be applied in this case. Notice 
that, by allowing reduced terms to remain in memory, rather than actually replac- 
ing them, sharing may be accomplished between subexpressions that never exist 
simultaneously in any intermediate expression in a reduction sequence. Thus, no 
evaluation will ever be repeated. Such a complete sharing strategy would accom- 
plish an even stronger form of automatic dynamic programming than that 
developed in Section 15.4, in which the most naive recursive equations would be 
applied efficiently, as long as they did not require many different expressions to be 
evaluated. Of course, there is a substantial space cost for such completeness. In 
practice, a certain amount of space could be allocated, and currently unused nodes 
could be reclaimed only when space was exhausted, providing a graceful degrada- 


tion from complete sharing to limited amounts of reevaluation. 


The sketch of a complete sharing strategy given above resembles, at least 
superficially, the directed congruence closure method of Chew [Ch80]. In 


congruence closure [NO80], equations are processed by producing a congruence 
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graph, each node in the graph representing an expression or subexpression. As 
well as the father-son edges defining the tree structure of the expressions, the 
congruence graph contains undirected edges between expressions that are known to 
be equal. Initially, edges are placed between the left- and right-hand sides of equa- 
tions, then additional edges are added as follows: 

1. (Transitivity) If there are edges from a to 6 and from 8 to y, add an edge 


from a@ to y. 


2. (Congruence) If there are edges from a; to 6; for all i from 1 to n, add an 


edge from f (a,,°°+ ,2) to f(8,, °°: ,B,). 


A carefully designed algorithm for congruence closure [NO80, DST80] may be 
quite efficient for equations without variables, but must be modified for equations 
with variables, since there is no bound on the size of the graph that must be 
treated. The directed congruence closure method of Chew uses directed edges to 
indicate reductions, and adds nodes to the congruence graph only as they are 
needed to make progress toward a normal form. Every time a reduction edge is 
added, congruence closure is applied to the whole graph. Directed congruence clo- 
sure was shown by Chew to avoid all reevaluation of the same expression. We con- 
jecture that the rehashing strategy for dynamic sharing, sketched above, is essen- 
tially an optimization of the directed congruence closure method, in which closure 
is applied only to portions of the graph that turn out to be needed for progress 
toward a normal form. A careful implementation of some form of congruence clo- 
sure would be an extremely valuable option in a future version of the equation 
interpreter. For cases where repeated evaluation is not expected to arise anyway, 


its overhead should be avoided by applying a less sophisticated sharing strategy. 
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Independently of the various issues discussed above, the equation interpreter 
needs a way to reclaim computer memory that was allocated to a subexpression 
that is no longer relevant. We tried the two well-known strategies for reclaiming 
memory: garbage collection and reference counting. The first implementations 
used a garbage collector. That is, whenever a new node must be allocated, and 
there is no free space available, the whole expression memory is traversed, detect- 
ing disconnected nodes. Garbage collection has the advantage of no significant 
space overhead, and no time wasted unless all storage is actually used. Unfor- 
tunately, as soon as we ran large inputs to the interpreter, garbage collection 
became unacceptably costly. Typical garbage collections would only free up a 
small number of nodes, leading to another garbage collection with rather little 
reduction in between. In fact, it was usually faster to kill a computation, recompile 
the interpreter with a larger memory allotment, and start the program over, than to 
wait for a space-bound program to finish. Based on this experience we chose the 
reference count strategy, in which each node contains a count of the pointers to it. 
When that count reaches zero, the node is instantly placed on a free list. Refer- 
ence counting has a nontrivial space overhead, and adds a small time cost to each 
creation and deletion of a node. Unlike garbage collection, it does not cause all of 


the active nodes to be inspected in order to reclaim inactive ones. 


19. Toward a Universal Equational Machine Language 


While the equation interpreter project has attempted to provide an efficient imple- 
mentation for the widest possible class of equational programs, other researchers 
have sought a fixed set of primitive functions defined by equations as a universal 
programming language. Pure LISP [McC60] may be viewed as a particular equa- 
tional program, defining a general purpose programming language interpreter. 
More recently, Backus [Ba78] has defined a Functional Programming language by 
a fixed set of equations. Turner [Tu79] suggests the Combinator Calculus as a 
universal language, into which all others may be compiled. There are a number of 


attractions to a fixed, universal, equationally defined programming language: 


1. The designer of such a language may choose primitives that encourage a par- 


ticular programming style. 


2. A well-chosen set of equations might be implemented by special-purpose tech- 
niques more efficient than the more general techniques used in the equation 


interpreter. 


3. A particular equationally defined programming language might provide the 


machine language for a highly parallel computer. 


While agreeing with the motivation for choosing a fixed set of equations, we believe 
that the criteria for such a choice are not well enough understood to allow it to be 
made on rational grounds. Aside from the subjective issues of convenience, and the 
technology-dependent issues of efficiency, there is no known method for establishing 
the theoretical sufficiency of a particular equational language to simulate all others. 
In this section, we investigate the theoretical foundations for universal equational 
programs, and produce evidence that the Combinator Calculus is not an appropri- 


ate choice. Since FP, SASL, and many similar proposed languages compile into 
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combinators, they are also insufficient. Unfortunately, we have not found an 
acceptable candidate to propose in its place, but we can characterize some of the 


missing qualities that must be added. 


First, consider the more usual procedural languages and their sequential 
machines. Accept for the moment the architectural schema of a Random Access 
Machine, with an unbounded memory accessed by a central processor capable of 
performing some finite set of primitive operations between finite sequences of stored 
values. Each choice of primitive-operation set determines a programming language 
and a machine capable of executing it directly. In order to build a general-purpose 
machine, we usually choose a set of primitive operations that is universal in the 
sense that every other finite set of computable operations may be compiled into the 
universal one. In theory textbooks, we often state only the result that some univer- 
sal set of operations is sufficient to compute all of the computable functions. In 
fact, we usually expect, in addition, that compiling one operation set into another 
has low complexity, and that the compiled program not only produces the same 
result as the source program, but does so by an analogous computation. The low 
complexity of the compilation is not usually stated formally, but the analogousness 


of the computation is often formalized as stepwise simulation. 


A reasonable-seeming candidate for a universal reduction language is the 
S—K Combinator Calculus, a language with the two nullary symbols § and K, 
plus the binary symbol AP for application. For brevity, AP (a,8) is written a8 and 
parentheses are to the left unless explicitly given. Reduction in the combinator cal- 


culus is defined by 
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KaB—a 
SaBy — ay(By) 
The Combinator Calculus is well-known to be capable of defining all of the com- 
putable functions [Ch41, CF58, St72], and has been proposed as a machine 
language [Tu79, CG80]. Certain computations, however, apparently cannot be 


‘simulated by this calculus. 


Consider a language containing the Boolean symbols T and F, and the paral- 


lel or combinator D, with the rules 


DTa—T 

DaT —T 

DFF — F 
Intuitively, in order to evaluate DaB we must evaluate a and @ in parallel, in case 
one of them comes out 7 while the other is undefined. On the other hand, it is 
possible to evaluate combinatory S—K expressions in a purely sequential fashion, 
by leftmost-outermost evaluation [CF58]. Thus, the only way to simulate the D 
combinator in the S—K calculus seems to be to program what is essentially an 
operating system, simulating parallelism by time-sliced multiprogramming. Such a 
simulation appears to destroy the possibility of exploiting the parallelism in D, and 


can hardly be said to produce an analogous computation to the original. 


This section formalizes the concept of simulation of one reduction system by 
another, and studies the powers of the S—K combinator calculus and its extensions 
by the parallel or (D) and arbitrary choice (A) operators. Section 19.1 defines 
reduction systems in a natural and very general way, and defines the confluence 
(Church-Rosser) property that holds for certain reduction systems. Section 19.2 
develops useful properties of the combinator calculi. Section 19.3 defines simula- 


tion of one reduction system by another, gives examples of plausible simulations, 
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and shows that a weaker definition allows intuitively unacceptable "simulations." 
Section 19.4 shows that the S—K calculus does not simulate the S—K—D calculus, 
and that the S—K—A calculus is universal. Section 19.5 shows that the S—K cal- 
culus simulates all simply strongly sequential systems. Section 19.6 shows that the 


S—K—D calculus simulates all regular systems. 


19.1. Reduction Systems 


The equational programs discussed in this book are viewed through the formalism 
of term reduction systems, presented in Section 17.1. The theoretical foundations 
for studying simulations of reduction systems seem to require a more general 
framework, where the states of a computation do not necessarily take the form of 
terms. Reduction systems are a more general class of formal computational struc- 
tures. The essence of a reduction system is a set of possible states of computation, 
and a relation that determines the possible transitions from one state to the next. 
States with no possible transitions are called normal forms, and represent situations 
in which the computation halts. There is no loss of generality in assuming that, in 
any state with a possible transition, some transition is taken. 

Definition 19.1.1 

A reduction system is a pair <S, —>, where 

S is a set of states 

—CSxS is a binary relation on S, called reduction. 

In most cases, we refer to the system <S, —> as S, and use the same — symbol 
in different contexts to indicate the reduction relation of different systems. 

Such a reduction system is effective if 

1. S is decidable (without loss of generality, S may be the nonnegative integers), 


2. (8| a6} is finite for all a, and the branching function n(a)=|{6| «—6}| is 
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total computable, 
3. the transition function t:S—P(S) defined by t(a)={8| «—6) is total comput- 
able. 


0 


Intuitively, a 8 means that a computation may go from @ to 6 in one step. 


Definition 19.1.2 
Let <S, —> bea reduction system. 
n€S is a normal form if there is no y such that 7 —v. 


Ng = {n€S| 1 is a normal form) 


0 

Definition 19.1.3 

A reduction system <S, —> is confluent if 

Va,B,vES (a>*8 & a*y) => FsES G—*5 & y—*8) 
(See Figure 19.1.1) 


0 


The confluence property is often called the Church—Rosser property, since Church 
and Rosser established a similar property in the A calculus. The confluence pro- 
perty is important because it guarantees that normal forms are unique, and that 
normal forms may be found by following the — relation in the forward direction 
only. For example, consider a reduction system with states a, 8, y, 6), 6)°°~°, and 
reduction relation defined by a8, a—y, a—*5), 5; 6;4;. 8 and + are the only 
normal forms. See Figure 19.1.2 for a picture of this reduction system. Because of 
the failure of the confluence property in this reduction system, a has two different 
normal forms, 6 and y. Furthermore, 6, cannot be reduced to normal form, even 


though it is equivalent to the normal forms 8 and y according to the natural 


19.1. Reduction Systems 225 


Figure 19.1.1 


Figure 19.1.2 
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equivalence relation generated by —. In order to find 6 or y from 6), we must 


take a reverse reduction to a. 


19.2. The Combinator Calculus, With Variants 


The S—K combinator calculus [Sc24, CF58, St72] was developed by logicians to 
demonstrate that the concept of a variable in mathematics may be eliminated in 
favor of more primitive concepts. More recently, Turner [Tu79, CG80] has pro- 
posed this calculus as a machine language, into which higher level languages may 
be compiled. Viewed as a reduction system, the combinator calculus is defined as 
follows. The particular reduction relation defined below is sometimes called weak 
reduction [St72]. Strong reduction mimics the d-calculus more closely, but 


requires an extended set of terms. 


Definition 19.2.1 

The S—K calculus is the reduction system <C[S,K], —->, where 

CIS,K] = (S,K,AP}y is the set of all terms built from the constants S and K, and 
the binary operation AP (as mentioned before, the AP operation is abbreviated by 
juxtaposition); 

— is the least relation satisfying 

KaB —a 

SaBy —ay(By) 

a—B => ay —By& ya —yB 

for all «,B,yEC. 

The S—K—D calculus is the reduction system <C[S,K,D], —> obtained from 
the S—K calculus by adding the constant symbol D and augmenting the relation 
— with the rules 


DKa—K 
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DaK —K 

D(K (SKK)) (K (SKK)) -K (SKK) 

(K represents truth and K(SKK) represents falsehood) 

The S-K—A calculus is the reduction system <C[S,K,A], —> obtained from 
the S—K calculus by adding the constant symbol A and augmenting the relation 
— with the rules 

Aap —a 

Aa —B 

0 

The S—K calculus is definable in the current version of the equation interpreter 
(see Section 9.9), and the S—K—D calculus will be handled by a future version. 
The S—K—A calculus is unlikely to be supported, because of its inherent indeter- 
minacy. See Section 15.2 for a discussion of the difficulties in dealing with indeter- 
minate constructs. 

Conventional presentations of combinators usually include the additional sym- 
bol J, with the rule Ja—a. For symbolic parsimony, we omit the J, since its effect 
may be achieved by SKK, as SKKa—Ka(Ka) —a. The following properties of 
the S—K calculus are well-known [St72], and clearly hold for S-K-D and S-K-A as 


well. 
Lemma 19.2.1 
Let J=SKK. 


Let @ be a term built from S,K,D,A, the variables x,y,z,:-*+, and the binary 
operation AP (variables are not allowed in CIS,X], CIS,K,D], CIS,K,A] as 
defined above). 


Let al8/x] be the result of replacing each occurrence of the variable x in a by 8. 
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Let Ax.a be defined by 

AxXxXEI 

Ax.y=Ky for x# y 

A\xX.S=KS 

\x.K=KK 
-Ax.D=KD 

d\x.A=KA 

dx. (8) =S (rx.c) (Ax.8) 

Then, (Ax.a)B ~*al8/x] for all a€CIS,K,D]UCIS,K,A]. 


0 


Notice that the definitions above translate all A terms into combinatory terms with 


variables, and all \ terms with no free variables into CLS ,K]. 


Lemma 19.2.2 [Ch41, St72] 

Let ¢ be any acceptable indexing of the partial recursive functions. 

There is a total computable function ~ from the nonnegative integers (N) to the 
normal forms of C[S,K] (Ncjs,x)) and a term v€Nc{s_x}, such that 

vi j 7*96,Q) for all i,j EN. 

In particular, the function ~ may always be defined by 

T=)dx.Ay.x (x(- ++ (xy) +++ )), where the number of occurrences of the variable x 
applied to y is i [Ch41]. 

0 

Lemma 19.2.3 [CF58, K180al 

There is a term »€CIS,K] that implements the least fixpoint function. That is, 
pa —*a(ue) for all w€CLS,K]. 


# may be used to construct a, °° ,@,,€CLIS,K] solving any simultaneous recursive 
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definitions of the form 


oN * 
oxy Xu, By 


Om Xm” Xr —* Bn 

where each B; is a term built from aj, °° ny rXy py Xj4S,K,D,A and AP. 
Specifically, p= Ox. Aff Cxexf)) Ax. Aff (xxf)) 
0 


Lemma 19.2.1 allows us to define procedures that substitute parameters into terms 


‘ 


with variables. Lemma 19.2.2 guarantees the existence of terms to compute arbi- 
trary computable functions on integers, saving us the trouble of constructing them 
explicitly. Lemma 19.2.3 lets us construct terms that perform arbitrary rearrange- 
ments of their arguments and themselves, even though those arguments may not be 


integers. That is, we may write explicitly recursive programs. 


Although arbitrary data structures, such as lists, may in principle be encoded 
as integers, and all computable operations may be carried out on such encodings, 
the computation steps involved in manipulations of encoded data structures may 
not correspond correctly to the intended ones. So, we define structuring operations 


in a new, but straightforward, way. 


Definition 19.2.2 
T=K 

F=kKI 
P=)x.dy.dz.zxy 
L=)dx.xT 
R=)x.xF 
C=)Ax.Ay.rz.x (yz) 
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M=)x.PxT 

<a,B>=dz.zaB8 

if « then B else y=aBy 

0 

T and F represent the usual truth values. Pa®, or <a,6>, represents the ordered 
pair of a and 8. L and R are the left- and right-projection operations, respec- 
tively. Pafy represents the conditional term that gives a when y is T, and 8 when 


y is F. These observations are formalized by the following straightforward lemma. 


Lemma 19.2.4 

For all a,6,y€CIS,K,D]UCIS,K,A): 
Pap —* <a,8> 

L<af>—*a 

R<a> —*B 

PoaBy —*if y then a else B 

if T then a else B—*a 

if F then a else B—*B 

CaBy —*a(By) 

Ma—* <a,T> 


0 


The more conventional pairing and projection operators defined on numerical 
encodings satisfy similar properties for those a,8,y that actually represent integers. 
This restriction is particularly troublesome, since all integer representations are in 
normal form. Thus, integer-encoded operators are strict, while L(Pa) as defined 


above reduces to a even if 8 has no normal form. 
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Pairing functions may be used to define lists. In order to be able to test a list 
for emptiness, we pair up every element with a Boolean value indicating whether 
the end of list has been reached. This is necessary because the property of being 


an ordered pair, in the sense of < >, is not testable within the calculus. 


Definition 19.2.3 

[ J] abbreviates <F,F> 

Lay, 0, °° * sa] 

abbreviates <T,<a,,<T,<ay,°*' <T,<a,,<F,F>>>°'+*>>>>, forn21. 


0 


All of the usual list operators may be defined by terms in C[S,K] in such a way 
that reduction to normal form produces the same result as a "lazy" or outermost 
LISP program [FW76, HM76, O’D77]. 

Lemma 19.2.5 

The S—K and S—K—D combinator calculi are confluent, but the S-K—A cal- 
culus is not. 

Proof sketch: 

See [CF41, St72] for the S—K calculus. The general results of [0’D77, K180a] 
cover S—K—D. ATF —T and ATF —F, disproving confluence for S-K—A. 


0 


Reduction systems, in general, may have no meaningful ways of identifying 
different reduction steps, other than by the states that they connect. When the 
states are terms, as in the three combinatory calculi, and when the definition of the 
arrow relation is given by rules allowing the replacement of certain subterms, it is 
natural and useful to identify reduction steps with the occurrences of subterms that 


they replace. 
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Definition 19.2.4 
In any of the three reduction systems defined in this section, a redex is an 
occurrence of a subterm that may be replaced by the — relation. 


When a8, a residual of a redex r in @ is a redex in @ directly resulting from r. 
0 


For example, in the S—K calculus, a redex is an occurrence of a subterm in one of 
the forms KaB, Say. In a reduction of the form alS 665] —alB5(75)], the only 
residual of a redex within @ in the leftmost expression, is the redex in exactly the 
same position in the rightmost expression. The only residual of a redex within 6 or 
¥ is the redex in the corresponding position in the explicit copy of 6 or -y, and the 
two residuals of a redex within 6 are the two redexes in corresponding positions in 
the two explicit copies of 5. The redex S#y6 has no residual in this case, because it 
is destroyed by the reduction. Residuals generalize naturally to arbitrarily long 


sequences of reductions. When it is necessary to trace residuals through reduction 
sequences, we will write a 7B to indicate that a reduces to 8 by replacing r or one 


of its residuals in a. For a more precise treatment of residuals, see [HL79, 
O’D77]. For the purposes of this section, the intuitive treatment above should 


suffice. 


Huet and Lévy also define sequentiality for reduction systems such as S—K 
and S—K—D, but their definition depends on the term structure of the states in 
these systems. See Section 16 for a discussion of sequentiality in term rewriting 
systems. We will isolate one consequence of the sequentiality of S—K, and nonse- 
quentiality of S-K—D, that may be expressed purely in terms of the reduction 


graphs. 
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Definition 19.2.5 

A reduction system <S,—> has property A if there is a function f:N —-N such 
that the following holds for all «,6,7,5€S: 

if a is a (not necessarily unique) least common ancestor of 8,y in the graph of —, 
and 8 "5, y —6, then there is a reduction sequence a—~*6§ with k<f (m,n). 

0 

Intuitively, property A says that the upper reductions shown in Figure 19.2.1 can- 


not be too much longer than the lower ones. 


a 


Figure 19.2.1 


In order to establish property A for the S-K calculus, we need a way of choos- 


ing a standard reduction sequence for a particular pair of terms. 


Definition 19.2.6 [CF58] 


The reduction sequence ay ro! ptt p77 Om is standard if, whenever rj4; is a 
1 


r, 
residual of a redex s in a, r; either contains s as a subterm, or is disjoint from s 


and to the left of it. In other words, redexes are reduced starting with the 
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leftmost-outermost one, and any left-outer redex that is skipped in favor of a right 


or inner one will never be reduced. 


0 


The following lemma is from Curry and Feys [CF58]. 


Lemma 19.2.8 
ty the S—K calculus, if a—’8, then there is a standard reduction of a to @ with at 
most 2! steps. 

0 

Lemma 19.2.9 

The S—K—D calculus does not have property A. 

The S—K calculus has property A, with the function f (i,j) =2'+2/. 

Proof sketch: 

The S—K—D calculus contains subgraphs of the form described in Definition 
19.2.5, with m=n=1, but k arbitrarily large. Let Jp=/, 1,4,=1;J. Notice that 
T;4; 71, and there are no other reductions possible on J;,,. Let a=D(I;T)(,T), 
B=DTU;T), y=DU;T)T, 5=T. B—'5, and y—'4, but a—/*76 is the shortest 
reduction sequence from a to 6. 

For the S—K calculus, let a,6,7,5€CIS,K],m,n€N be as in the statement of 
Definition 19.2.5. Since a is a least common ancestor of 8,y, the two reduction 
sequences a—*8 and a—*+y have no steps in common (i.e., reducing residuals of 
the same redex). Let a—~*8, a—~*y, a—*5, B38, y—-"'5 be the standard 


reductions. By Lemma 19.2.8, m'<2”, and n'<2". 


Consider a redex r that is reduced in a—*@, but not in ys. Since r is not 
reduced in a—*-, it must be eliminated in a—*y—”"6 by an application of the 


rule Kix —%, with r in x. In the standard reduction a —*6, the K-reduction above 
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comes before the reduction of r, so r is not reduced in a—*é. Thus, every redex 
that is reduced in a—*, and in a—*6 must also be reduced in y—"'6. A sym- 
metric argument covers a—*y and 8 ns. Every redex reduced in a —*§ comes 


from cither a—*8 or a—*y, sok <m'+n'<2"+2", 


0 


19.3. Simulation of One Reduction System by Another 


In the introduction to Section 19 we argued that, although every function com- 
puted in the S—K—D calculus may be computed in the S—K calculus, there are 
certain computations, with a parallel flavor, that can be produced by S—K—D but 
not by S—K. In this subsection, we propose a definition of simulation for reduction 
systems, that seems to capture the essential elements of simulations that preserve 
the general structure of a computation. As in the definition of stepwise simulation 
for conventional random access machines, we associate with each state in a guest 
system one or more states in a host system, which will represent the guest state. 
The association of guest computation steps with host computation steps is trickier. 
It is not appropriate to insist that every guest computation step be associated with 
a single contiguous path in the host system, since potentially parallel guest steps, 
when interpreted as multistep paths in the host, could well have their individual 
steps interleaved. On the other hand, it is not enough merely to require that « —~8 
in the guest if, and only if, a'—*@' for associated states in the host, since that 
requirement still allows pathological behaviors in the host that do not correspond to 
any such behavior in the guest. For example, if there is a large, simple cycle 
Q, a2 *** a, a, in the guest, the host would be allowed to have spurious 
reduction paths directly between o,' and a;', without involving the appropriate 


intermediate steps. The host could even have infinite reductions within equivalence 
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classes of states representing a single host state. 


The following definition of simulation is not clearly the right one, but it is at 
least a very plausible one, and addresses the concerns described above. The posi- 
tive and negative results about simulations in Sections 19.4, 19.5 and 19.6 provide 
evidence that the definition is reasonable, since they agree with a programming 
intuition about what attempts at simulation are and are not acceptable. The intent 
of the definition is to capture the idea that the set of possible choices in the host 
system must be exactly the same as the set of possible choices in the guest system, 
when the differences between different host representations of the same guest state 
are ignored. We do not require, however, that decisions in the host be made in the 
same order that they are made in the guest. Choices of different host representa- 
tions for the same guest state may predetermine future choices between different 
guest states. Roughly speaking, the host may be allowed to, but must not be 
required to, plan ahead in a computation sequence. Invisible book-keeping steps in 
the host are allowed, which do not change the represented guest state, but such 
steps are not allowed to grow without bound, else they could surreptitiously simu- 
late potential guest computation steps that have not been officially chosen for exe- 
cution in the host computation. 

Definition 19.3.1 

Let <S,, —,> (the guest) and <S,, —;,> (the host) be reduction systems. 
S; weakly simulates S, if there exist 

an encoding set ECS,, 

a decoding function d:S, —~-S,U{nil} (nil ¢ S,) 

a computation relation +, —, 


such that 
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1. dlEl=S, & d“'INs INECNs, 

2. Va,BeS, a, => d(a) >,d (8) 

3. Va,BeS, al, — —,)8 => d(a)=d (6) 

4. 

V a€E,BES, d(a) +8 => FsEE al, — —.)* (+, — =,)*5 & dO)=B 
5. Wa€Ng, d(a) €Ng, U {nil} 

6. There is no infinite —, — —, path 

<S;,, —,> simulates <S,, 7,> if, in addition, there exists a 


g 
bound function b:S, —-N such that 


6". VaBeE al, — 2) —,(—, — 7,)"8 => m<b(d(a)) & n<b(d(B)) 
A (weak) simulation is effective if the appropriate E,d, —,, and 5 are all total 


computable. 
0 


Intuitively, (1) requires that d maps E onto S,, respecting normal forms. 
d(a)€S, is the unique expression encoded by a€S,, but each BES, may have 
infinitely many encodings. The allowance for multiple encodings corresponds with 
common practice, where, for example, a stack is encoded by any one of many 
arrays containing the contents of the stack, plus extra meaningless values in unused 
components of the array. (2), (3) and (4) require that each —, reduction is simu- 
lated by any number of —, — —, reductions, which do not change the encoded 
expression, followed by exactly one — reduction to effect the change in the 


encoded expression. (5) prevents dead ends in the simulation. 


Notice that the effect of (1)-(4) is to select a subset E of S, to represent S,, 


and divide it into equivalence classes in a one-one correspondence to S,. A state 
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a€S, — E that is accessible by reductions from a state in E must also be associ- 
ated with a unique d(@)€S,. Each one-step reduction on S, is simulated by one 
or more reduction sequences between the equivalence classes in E. There may be 
—,, reduction sequences that slip between the various classes d~'[@]NE, but they 
still mimic —, reductions by their behavior on the classes d~[g]. Alternatively, 
“think of d7'[B]ME as the set of canonical encodings of 8 from which any oor 
reduction may yet be chosen. Members of d~'{8] — E still represent 8, but may 
require some additional reductions to display 8 canonically, or may predetermine 


some restrictions on the se reductions of @. 


The relation —*, could be always taken as the restriction of —), to 


U d~[a]lxd='[g], except for the necessity of representing self-loops of the form 
ap 


ans. 


gs (6) prevents infinite reductions in —), that do not accomplish anything 


with respect to —,. (1)-(6) together allow us to find a normal form for w€S, by 
encoding it ina B€ENd™[al, reducing 8 to a normal form +, then taking d(y) for 
the normal form of a. (6') strengthens (6) to require that the maximum length of 
possible —, — —*, reductions to w€S, is a function only of the encoded expres- 
sion d(qa), not of the particular encoding a. Restriction 6’ enforces the intuitive 
rule that invisible book-keeping steps in the host computation must not be so com- 
plex that they actually simulate potential guest computation steps that the host 


chose not to perform. 


Notice that Definition 19.3.1 really has to do with simulating a certain degree 
of nondeterminism, rather than parallelism. Simulating all possible degrees of 
nondeterminism appears to be necessary for simulating all degrees of parallelism, 


but is certainly not sufficient. It is not clear to us how to capture degree of paral- 
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Ielism precisely at an appropriate level of abstraction. 
The following lemma is straightforward, but tedious, to prove. 


Lemma 19.3.1 
(Effective) weak simulation and simulation are reflexive and transitive relations on 


reduction systems. 


0 


While the bounding restriction (6') might seem excessive, there are certain 


weak simulations that are intuitively unacceptable. 


Theorem 19.3.1 

The S—K calculus effectively weakly simulates the S—K—D calculus. 

Proof sketch: . 

The basic idea is to encode a term Daf by an S—K term of the form p<ix>. t 
and « are programs producing possibly infinite lists of static data structures 
representing the possible reductions of a and 6 respectively. p is a program using 
"lazy" evaluation to alternately probe one step farther in « and x respectively, until a 
T is found in one or the other, or an F is found in each. When such Boolean 
values are found, p throws away the lists « and x, and produces the appropriate 
Boolean value. The decoding function maps p<.,x> to Daf, where a and B are 
the last items actually appearing on the lists 1 and x respectively, as long as no T 
appears. As soon as a T appears, followed by nil (encoded as <F,F>) to mark 
the end of list, the decoding function maps to T, even though the program p has 


not yet discovered the T. 


0 


The weak simulation outlined above is intuitively unsatisfying, because it really 


simulates the parallel or behavior of Da@ by an explicit and rigid alternation of 
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steps on w and @. Although at first the programs « and x may proceed completely 
independently, the behavior of p forces the first one of them to reach T to wait for 
the other one to make at least as many steps before the normal form can be 
reached. All of this catching up is hidden within the equivalence class encoding T. 
In consequence, arbitrarily long sequences of reductions may go on entirely within 
this equivalence class. It is precisely such arbitrarily long reductions within an 


equivalence class that are ruled out by (6’) in Definition 19.3.1. 


It is useful to apply a geometric intuition to reduction systems by treating 
them as (usually infinite) directed graphs whose nodes are the states, and whose 
edges show the reduction relation. In the weak simulation of Theorem 19.3.1, con- 
sider a term Daf, where a and B each reduce by a unique path to T. The graph 
representing the part of the S—K—D calculus below Def is suggested in Figure 
19.3.1. Reductions down and to the left indicate those applying to a, and those 
down and to the right apply to 8. The terms along the lower left edge are all of 
the form DTy, where 8 —*+, and those on the lower right are of the form D6T, 


where a—*6. 


Figure 19.3.2 shows the part of the S—K graph below the encoding p<.,«> of 
Daf. In this case, terms that show the same reductions to 1,x, but different 
amounts of reductions involving p, are gathered into one blob. Reductions to the 
left indicate reductions to 1, those to the right to x. Reductions involving p are hid- 
den in the blobs. The lower left edge contains terms of the form p<(--- T),y>, 
where x —*-y, the lower right contains terms of the form p<y,(--- 7)>, where 
t—-*y. The dotted lines surround blobs representing JT. Notice how arbitrarily 
long paths arise along the lower edges within the region representing 7. These 


long paths violate (6'). 
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For comparison purposes, Figure 19.3.3 shows the part of the S—K graph 
below a term of the form ifa then T else B, where a8 each reduce to T. This 
form is often called the conditional or of a and 8. The lower left edge contains 
terms of the form if T then T else y, and the lower right edge contains terms of 
the form if y then T else T. 

Definition 19.3.2 
An effective reduction system is universal if it effectively simulates all effective 


reduction systems. 


0 


19.4. The Relative Powers of S—K, S—K—D, and S—K—A 


Theorem 19.4.1 

The S—K calculus does not simulate the S—K—D calculus. 

Proof sketch: 

Suppose the contrary, and let b be the bound function in the simulation. Consider 
the S—K—D terms a=D (IT) IT), B=DT ImT), y=D UI mT)T, 5=T. 
Notice that @ is a least common ancestor of 6 and +, and the shortest reduction of 
a to 6 is of length 2°+141,. Choose S—K terms a'€d[al, p'€d—[B]NE, 
B"ed—"Ie], y'€d“ly]NE, y"ed"Ty], s'€d“"[5]NE, such that a'—*p'—*B", 
a’! —*y'—*-"" a! is a least common ancestor of B",y", and B",y" cannot be reduced 
further within d~'[g], d~'[y] (see Figure 19.4.1). The existence of such terms is 
guaranteed by 3 and 4 of Definition 19.3.1, and by the confluence property. The 
shortest reduction of a’ to 6’ must be of length at least 2°°+!+41. So, by Lemma 


19.2.9, BY ™3', y" —"5', with 2° +141<2"42". At least one of m,n must be 


>b(T), contradicting restriction 6' of Definition 19.3.1 


0 
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Figure 19.4.1 


Theorem 19.4.2 
The S—K—D calculus does not simulate the S—K—A calculus. 


Proof sketch: 


The proof is elementary, since no confluent system may simulate a system that is 


not confluent. 


0 
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Theorem 19.4.3 

The S—K-—A calculus is universal. 

Proof sketch: 

Let <N,—> be an arbitrary effective reduction system, with nonnegative integers 
for states (no loss of generality comes from the use of nonnegative integers). The 
function ~:N -CIS,K] is the encoding of integers into combinatory terms from 
Lemma 19.2.2. Let &€CIS.K] be an equality test for encoded integers. Let 
n€CIS,K] be a combinatory term for a program that takes an argument a, 
representing a€N, and computes the encoded integer nla), where 
n(a)=|(G€N| a—f)}|. Let p€ClS,K] be a combinatory term for a program that 
takes an argument a, and an encoded integer 7, and computes the encoded state B 


BEN that results from performing the ith possible reduction to a. That is, 
VijeNn gj jer & ixjPw yr, 
VatS na—* In (I, 

and 
VaeSieN pat —*B,;, 


where a —8,, is the ith reduction from a. Such 7,p exist by Lemma 19.2.2. Let 
K°=], K''=)x.K (Kix), A°=I, A't=)x.A(K'x)A!. By Lemmas 19.2.2, 
19.2.3, there is an e€CIS,K,A] such that 


a —*if E(na)0 then a else (pa(A"™ 01-+- Ata))) 


Now, we may encode each a€WN as :a€E if @ is not in normal form, as @€E if a@ is 
in normal form. Let the nonnil values of d be determined by the following rule: 


if ce —*B by a reduction sequence involving only A redexes and redexes in 
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a —*if E(na)0 then a else (pa(A"™ 0 1--- atad)) —* 
if T then @ else (pa(A"™ 0.1 - ++ ala))) ~*a 

(when @ is in normal form), or 

1a —* if E(na)0 then & else (pa(A"™ 01 +++ aay) +* 
if F then @ else (pa(A"™ 01--+ aad) 


(when @ is not in normal form), then d(@)=a. 


0 


Although the S—K—A calculus is technically universal, it is not a good foun- 
dation for equational computing. In fact, the universality of S—K—A illustrates 
that our definition of simulation captures degree of nondeterminism, rather than 
degree of parallelism, since the arbitrary choice operator is intuitively a sequential 
but nondeterministic construct. Many, if not most, parallel computations that a 
programmer wants to define have uniquely determined final results, in spite of non- 
determinism in the computation. The inherently indeterminate behavior of the A 
combinator makes it dangerous as a fundamental constructor for determinate com- 
putations. The best foundation for equational computing is probably a layered 
language, containing a subset of symbols that produce all of the desired deter- 
minate computations, and something like the A combinator, to be used in the infre- 
quent cases where truly indeterminate behavior is required. There may be other 
layers of disciplined behavior that should also be covered by simple sublanguages. 
In the next two subsections, we illuminate the behaviors that may be simulated by 
the S—K and S—K—D calculi. Section 19.7 develops other systems to simulate 
the behavior of the lambda calculus, which apparently cannot be simulated by any 


of the combinatory calculi discussed so far. 
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19.5. The S—K Combinator Calculus Simulates All Simply Strongly Sequential 


Term Reduction Systems 


In order to simulate an arbitrary simply strongly sequential term reduction system 
(see Definition 17.2.3) S over 2 in the S—K calculus, the basic idea is to let con- 
tiguous portions of an S—K term represent terms in (ZU {w})y. Initially, each 
such term is of the form f(w, +++ ,w), with exactly one symbol in 2 and the rest 
ws. Whenever one of the represented (ZU (w})y terms becomes stable, it produces 
a direct syntactic representation of itself, that is accessible to operations from 
above. As long as a represented (ZU {w})y term is a strictly partial redex, it 
absorbs the topmost symbol from an index position below it (which can only be 
done after that index position becomes stable). When a represented (ZU {w})y 
term becomes a complete redex, it produces the associated right hand side of a rule 
in S, in the initial form of f (w, «++ ,w) representations. 

First, we define the direct syntactic representation of terms in (ZU {w})y. 
This representation is essentially the representation of terms by nested lists in LISP 
[McC60], composed with the encoding of lists in S—K of Definition 19.2.3. 
Definition 19.5.1 
The syntactic encoder syn:(2U {w}) » CIS ,K] is defined as follows. 

Let a'€(ZUV)y be the result of replacing the occurrences of w in a by x1,xX2,°°° 
in order from left to right. 

closure, (8) = Ax,.°** AxX,,-B 

syn(a) = closure, (syn'(a')), where n is the number of ws in a, and 

syn':(DUV) 4 +((S,K,AP} UV) is defined inductively by the following equations: 
syn'(x)=x for x€V 


syn'(a;) =[T] if p(a;) =0 
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syn'(a; (a), °° * ,@p(a,)) = [ T,syn'(a)), - - + syn'(a(q,))] 
In the lines above, the expressions [7] and [ 7,syn'(a,),---] indicate the list 
encodings of Definition 19.2.3. 


0 


Notice that syn(a) contains no variables, and is in normal form. 

Now, we define the active semantic representation of a term in (ZU (w})y. 
We depend on Lemma 19.2.3, which guarantees the ability to solve multiple simul- 
taneous recursive definitions in S—K. Only a finite number of (ZU {w))y terms 
need be represented, so we do not need an infinitary mutual recursion. 


Definition 19.5.2 
Let P = {a€(ZU (w})y| a is a partial redex, 


or a is stable and V B<« 8 is a partial redex) 


P is a finite set. 


Let Y€CIS,K] implement the selector function for lists, with 
Va, "++ am €CIS,K1,1<i<m pila, - > 11 * a. 


Let o,,,,€CIS,K] implement a spreading function, such that 
Va,By,°-* Bm €CIS,KIjEN 

Om,i J 0B, ~*~ Bm —* 0B; * + By (Y 2B) «+» WY pla +l B)) 

= Bist °° Bs 


The semantic encoders sem:Xy —CI[S,K] and sem’:P-C[S,K] are defined by 
simultaneous recursion as follows: 
sem'(a) —*syn(q) if « is stable 


sem'(a) —*closure,, (sem(8)) if a=ca'lw/x),-- - ,w/x,,] and a' ~B € S. 
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sem'(a) x4 °° * Xmq —* Om, Y 1 x;) (WY T x,) (sem'(a;) «++ sem'(a,)))x, °° Xm 
where a is a partial redex but not a redex, the sequencing function for S chooses 
the index a’ for a, there are i-1 ws to the left of the variable in a’, and 


Qa; 


1, = a'la;(a, ++ w)/x;]. 


sem(x) = x where x€V 

sem(a;) = sem'(a,) if p(a;)=0 

sem(a; (ay, ** > ,@p(q,))) = sem'(a;(w, - + + ,w))sem(a,) - - + sem(eg(a,)) 
Lemma 19.2.3 guarantees a solution for each sem(a). 


0 


The definitions of sem and sem’ above may be understood as producing a set of 
communicating sequential processes, which partition among themselves the nodes of 
a term a€Zy, and whose communication network represents the tree structure of 
the term. Initially, each node of a is in a separate process, that knows only the 
symbol at that node. Each process simultaneously tries to gather up enough nodes 
to form a redex, or to learn that its nodes are stable. As long as a process 
possesses a strictly partial redex, it requests the head node of the unique son pro- 
cess specified by the sequencing function given in the definition of simply strongly 
sequential. When a process possesses a whole redex, it performs the reduction 
associated with that redex. When a process discovers that its nodes are stable, it 
halts and produces messages that can be read by its father when the father process 


wishes to gather up more nodes. 


It is convenient to use a more reduced form of sem(a) as the canonical 
representative of a term. The encoding function e gives the canonical representa- 


tive. 
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Definition 19.5.3 

The encoding function e:Xy —CI[S.K] is defined by: 

e(alB,/x1,°°* Bm/Xm)) = sem'(alw/x,, +++ ,w/x,,])e(6,) «++ e(6,,) where @ is a 
left hand side of a rule, or alw/x), +--+ ,w/x,,] is a partial redex and 6,,°°* Bm 
are not strongly stable. 

e(alB)/x1,° °° 8m/%m1) = syn(a)le(B,)/x,, +++ ,e(6,,)/x,,] where every node in 


a is strongly stable, and 6), ~°~ ,8,, are not strongly stable. 


0 


e partitions a term (considered in graphical form) into contiguous regions that 
form maximal partial redexes, and the intervening stable regions. Each partial 
redex is represented by the semantic encoding, and each stable region is 
represented by the syntactic encoding. The nonoverlapping property of regular 
reduction systems (Definition 17.1.3) guarantees that the partition, hence e, is 
well-defined. 

Lemma 19.5.1 

1) syn, sem, and e are one-to-one. 

2) sem(a) —*e(a) 

3) If a—6 € S, and &— is an instance of a8 with a = aly,/X1,°°* *Y¥m/*m); 


then 
e(a) =sem'(a)e(y,) -- + eCy,,) —*sem(6)[e(y,)/x),°°° €(Ym)/Xm] *e(B) 


4) If ais in normal form, then e(a) = syn(q) is also in normal form. 
Proof sketch: 
All is straightforward except 2. The key fact in showing 2 is that the nonoverlap- 


ping property of Definition 17.1.3 guarantees that when a is a left hand side of a 
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rule schema, then every proper subterm of alw/x,,°-- ,w/x,,], other than w, is 
strongly stable. Thus, the nonroot nodes of a redex convert to the syntactic encod- 
ing, and may be gathered into the appropriate semantic encoding form at the root 


of the partial redex. 


0 

Theorem 19.5.1 

The S—K calculus effectively simulates every simply strongly sequential term 
reduction system. 

Proof: 

Let the encoding set E be the range of e. The nonnil values of the decoder d are 
defined by the following condition: if «—6,,°++,«—8, are all of the possible 
one-step reductions of a, and sem(a) —*y by reducing a subset of the redexes in 
the shortest reductions 

sem(a) +*e(a) ~*sem(8,), ** + sem(a) —*e(a) —*sem(6,), 

not containing all of the redexes in any one of the preceding reductions, then 
d(y)=a. 

—, = —N(CIS,K]xE). 


0 


19.6. The S—K—D Combinator Calculus Simulates All Regular Term Reduction 


Systems 


The basic idea of the simulation of an arbitrary regular term reduction system by 
S—K-—D is similar to the simulation of simply strongly sequential systems by S—K 
in Section 19.5. Instead of choosing a unique index position to absorb into a par- 


tial redex, the simulation tries in parallel to match every redex at every node in the 
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term. The parallelism between different nodes is treated by similar parallelism in 
the S—K—D calculus; _ parallelism at a single node uses the D combinator. The 
problem is that D merely gives the or of its arguments, it does not tell us which 
one came out 7. In some cases, where the left hand sides of rules are unifiable, 
the right hand sides must be the same, so it is not important to know which of the 
unifiable left hand sides applied. In other cases, where right hand sides are 
different, it is crucial to determine which one to use. The solution is to first dis- 
cover that some rule applies at a particular node, then test, in sequence, each left 
hand side of a rule. In testing a given rule schema, check every node in the left 
hand side of that rule schema in parallel (using the obvious simulation of a parallel 
and by a parallel or). The regularity of the system guarantees that, given that 
some rule applies, the parallel test whether a particular rule applies must halt. The 
following simple example illustrates this idea. The left hand sides in this example 


come from [HL79]. 


Example 19.6.1 

Consider the rule schemata f (x,a,b) 1, f(b,x,a) +2, f(a,b,x) 73. The sys- 
tem defined by these rules is not strongly sequential, because there is no way to 
choose which son of an f to work on first. In a simulation of a computation in this 
system, suppose the test "does the current node match rule 1 or rule 2 or rule 3?", 
carried out with the parallel or, answers T. In order to find out which of the three 
rules applies, try the three tests: "is the 2nd son a and the 3rd son 52", "is the Ist 
son 6 and the 3rd son a?", "is the Ist son a and the 2nd son b?", sequentially, 
using a parallel and in each one. (equivalently, test "is not the 2nd son not a or 
the 3rd son not b?", etc.). Since the nonoverlapping property holds, the tests that 


do not correspond to the applicable rule must differ from the correct one in some 
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position, so the parallel and will produce an F response, rather than nontermina- 


tion. 
U0 


The simulation of regular systems by S—K—D uses the same syntactic 
encoder syn of Definition 19.5.1, but a new semantic encoder. 
Definition 19.6.1 
Let the regular set S of rule schemata be partitioned into equivalence classes 
S,,--+,S,, where a, 6, is equivalent to a, 6, if a, and a are unifiable. By 
the definition of regularity, 8, and 6, must be the same. 
Let P and y be as in Definition 19.5.2. 
In addition, let K°=1, K't!=)x.K (K'x), D9=I, D'*!=)x.D(K'x)D!. 
Let N=)x.xFT implement logical negation, and let x,€CIS,K] be a program to 
check one symbol in a syntactic encoding against the corresponding position in a 
rule schema. That is, 
Va€(ZU (a) gi, 7 EN 
a agrees with the jth symbol in rule schema i => x, 1 Jj synla) -*T & 
a does not agree with the jth symbol in rule schema i => x, 1 J syn(a) -*F 
X2 checks a rule schema in parallel, that is, 
VaeCls.K]i€N xT a*N(D"(N Gy T 1) ++: (NG TS @)), 
where s; is the size of the ith left hand side. 
x checks an entire equivalence class of rule schemata in parallel, that is, 
VaeCls,KIjEN x J a—*D" (xpi 1a) «+ - (coh) 
where i,,°-- i, are the numbers of the rule schemata in the jth equivalence class. 


Let a:ClS,K]x(ZUV)y -({S,K,AP}UV)y be a function that builds a syntactic 


19.6. The S-K-D Calculus 255 


encoding of a term, applying a given combinator to every node. That is, 

a(a,x) =x for x€V 

a(a,a;(B1,°°* Bm)) = al T,ala,6)),- >: ,aa,6,,)] 

Let o, be a program that constructs a right hand side instance of the jth 
equivalence class of rule schema from a left hand side instance, with a specified 
operator applied to each node in the right hand side. That is, 

Vie 1 mde Ly 05 synla)lyy/x 4s + Ym! Xm] a S,B)Ey/% 1° °° Ym/ Xm] 


where a — is in the jth equivalence class. 


The parallel semantic encoders psem'€C[S,K] and psem:Zy —CIS,k] are defined 
by recursion as follows: 
psem’ x —*if D(x 1x) +++ (x k x) then 

(if x 1 x then a, psem’ 


else if x 2x then o2 psem’ 


else if x k x then oy psem')x 
else x 
where k is the number of equivalence classes of rule schemata in S. 
psem(a) = a(psem',a) 
0 
As before, we define a more reduced encoding than that given by psem. 
Definition 19.6.2 
The parallel encoder pe:Zy —CIS,K] is defined by: 
pe(a;(a,,°°+ ,a,,)) = 


psem' [ T,pe(a,), --- ,pe(a,,)1 if a; (a, «+ * ,@,,) is an w-potential redex 


a es 
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[T,pe(a,), +++ ,pe(a,,)] if a;(a;, +++ ,o,,) is strongly stable 
Lemma 19.6.1 
1) psem and pe are one-to-one. 
2) psem(a) —* pe(a) 
3) Ifa—8 € S, and a6 is an instance of a8 with @ = aly,/x1,°°* Ym/Xm), 
then pe(a) —* psem(8)[pe(y,)/x 1, * + * pe(Ym)/Xm] —* pe(8) 
4) If ais in normal form, then pe(a) = syn(q) is also in normal form. 
Proof sketch: analogous to Lemma 19.5.1. 
0 
Theorem 19.6.1 


The S—K—D combinator calculus effectively simulates every regular term reduc- 


tion system. 


Proof sketch: analogous to Theorem 19.5.1. 


0 


19.7. The Power of the Lambda Calculus 


Another mathematically natural candidate for a universal equational language is 
the Lambda Calculus [Ch41]. 

Definition 19.7.1 

Given an infinite set of nullary symbols V, called variables, MV] = {Ax| x €V}. 
Each dx is intended as a one-argument function symbol. 

The Lambda Calculus is the reduction system <A|=,—-> where 
A = ({4P} UVUAIV]) y is the conventional set of lambda terms. As in the combi- 
nator calculus, AP is abbreviated by juxtaposition, and associates to the left. 


dx (a) is written Xx.e. An occurrence of a variable x in Ax.a is a bound 
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occurrence, 


all other occurrences of variables are free. 

a= if a may be transformed to 6 by systematic renaming of bound variables 
(e.g., Ax.x=Ay.y). In the sequel, a lambda term « denotes the equivalence class 
{g| =). 

— is defined by 

(Ax.c) B ~alB/x] 

a B= ay By & ya —-yB 

where al8/x] denotes the result of substituting 8 for each free occurrence of x in 


a, renaming bound variables as necessary so that free variables remain free. 


0 


The Lambda Calculus is, a priori, a weaker candidate for a universal equational 
machine language than the S—K Combinator Calculus, because a single reduction 
step appears to require an unbounded amount of work, depending on the number of 


occurrences of the variable being substituted for. 
The Lambda Calculus may be compiled into the S—K Combinator Calculus 
by the translation ~ defined as follows. 
Definition 19.7.2 
x=x 
dex = 1 
xy = Ky, where xy 
LaaB = Shx.adx.8 
a8 = a8 LT 


The translation of Definition 19.7.2 has been proposed as a method for compiling 
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the Lambda Calculus into the more primitive Combinator Calculus, because it 
satisfies the desirable property of the well-known Theorem 19.7.1 [St72]. 

Theorem 19.7.1 

Qx.a)8 al B/xl, for all x €V, «BEA. 


O 


Unfortunately, the translation does not satisfy the stronger property a ~B=>a@ —B. 
Example 19.7.1 [St72] 

Ax.((Ay.y)z) Ax.z, yet Ax.(Ay.y)z) = S(KI)(Kz), which is in normal form. 
Consider the translation of Ax.((Ay.y)z) into combinators, step by step: 

ix Oyy)z) = Sdx. Ayy)Axz = SixDAxz = S(KDAxZ = S(KDixz = 

S(KI) (Kz) 

Compare the derivation of Xx. (Ay.y)z) to the following one of the subexpression 
Oyy)z: 

Opy)z = Oyy)z = =z 

Notice that, by itself, the subexpression (Ay.y)z translated to Jz, which reduces to 
z. But, inside the binding Ax, the J and the z are separated into S (KI) (Kz). 
Once the latter expression is applied to an argument, the redex corresponding to 
the Iz is created, as in 

S (KI) (Kz)w -KIw (Kzw) -1I(Kzw) ~Kzw —2z. 


0 


Example 19.7.1 shows that the translation into combinators enforces outermost 
evaluation in some cases, eliminating the possibility of taking an innermost step. 
Since, in principle, the two redexes in Ax. ((Ay.y)z) might be reduced concurrently, 
the translation into combinators does not provide a simulation of the Lambda Cal- 


culus according to Definition 19.3.1. Intuitively, the standard translation into 
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combinators seems to be deficient if the translated program is to be executed on 
parallel hardware. There are a number of improvements to the simple translation 
of Definition 19.7.2, which solve the problem of Example 19.7.1 and similar small 
examples. None of the known improvements realizes the full parallelism of the 
Lambda Calculus, however. The equational program for the Lambda Calculus 


presented in Section 9.10 suffers from a similar loss of parallelism. 


We have not been able to prove that the Combinator Calculus cannot simulate 
the Lambda Calculus, but we conjecture that it cannot. Klop [KI80b] has demon- 
strated some interesting graph-theoretic differences between the A-calculus and the 
S-K calculus, but they do not rule out the possibility of a simulation. It is well- 
known in combinatory logic that no finite set of equations between combinators can 
provide a full simulation of the Lambda Calculus under the standard translation 
(or any of the known variants). Furthermore, informal reflection on Example 
19.7.1, and similar examples, shows that, in the Lambda Calculus, reduction of an 
outer redex may leapfrog an inner redex to substitute for a variable inside it. Nei- 
ther the Combinator Calculus, nor any other regular term reduction system, may 
display such behavior. The translations allowed by Definition 19.3.1, however, 
include ones that completely change the term structure, so this observation does not 


lead to a proof. 


There is a variation on the Lambda Calculus, similar to that in Section 9.6, 
that preserves all of the apparent parallelism while doing only a small, bounded 
amount of work in each reduction step [OS84]. The essence of the variation is 
given by the first set of equations in that section, before removing overlap. The 
difficult part of the variation is the efficient renaming of bound variables to avoid 


capture. This variation cannot be programmed in the equation interpreter, because 
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it involves an inherent violation of the nonoverlapping restriction. The Lambda 
Calculus has Property A of Definition 19.2.5, so it cannot simulate the parallel or. 
The apparent deficiency of combinators resulting in the inability to simulate the 
Lambda Calculus is separate, then, from the deficiency with respect to the parallel 
or. Having found these two deficiencies, we should be very careful about accepting 
‘any reduction system, even the Lambda Calculus plus the parallel or, as sufficient 


for parallel computation, without a solid proof. 


19.8. Unsolved Problems 


The definition of simulation in this section is plausible, but is not precisely the right 
one. The results of this section should be taken as a critique of the definition of 
simulation, as much as statements about the particular reduction systems studied. 
Besides the essential problem of characterizing parallelism, instead of just non- 
determinism, the definition of simulation may not be exactly right even for captur- 


ing the degree of nondeterminism. 


It is disturbing that, although the intuitive difference between the S—K and 
S-K-—D calculi has to do with optional sequential or parallel computation versus 
required parallelism, the proof that S—K cannot simulate S—K—D hinges on sim- 
ple abstract graph-theoretic properties of the two calculi. We had expected the 
proof that S—K does not simulate S—K—D to use recursion theory, since the criti- 
cal difference between S—K and S—K—D has to do with the existence of a com- 
putable function to pick the next required computation step in S—K, and the lack 
of any such computable function in S—K—D. In particular, it appears that 
effective simulation of S-K—D by S—K should contradict standard recursion- 
theoretic results by providing a recursive separation of {i| ¢;(0)=0} and 


{i| ¢;(0)=1}. We could construct an S—K—D term of the form Daf, where a 
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tests $;(0)=0, and 8 tests ¢;(0)=1, simulate it with an S—K term y, and if the 
leftmost-outermost reduction of y simulated reductions to a but not 8, we should 
be able to conclude that ¢; (0) 1, and vice versa. We have not succeeded in prov- 
ing that the leftmost-outermost reductions of y cannot simulate interleaved reduc- 


tions to @ and @ in such a case, although such a simulation looks impossible. 


Whether or not the definition of simulation is exactly right, the simulations of 
simply strongly sequential systems by S—K, and regular systems by S—K—D are 
intuitively satisfying, and should be allowed by any reasonable definition. It would 
be useful to know other simple reduction systems that simulate all systems in some 
natural restricted classes. In particular, the existence of a confluent effective 
reduction system that effectively simulates all other confluent effective reduction 
systems is an important open question. Also, a strongly sequential reduction sys- 
tem that simulates all others of its class would be useful. We conjecture that S—K 


does not simulate all strongly sequential systems. 


An interesting hierarchy could develop around a natural sequence of more and 
more powerful combinators. We conjecture that the S—K—P calculus, using the 
positive parallel or with the rules PTa—T, PaT —T, does not simulate the 
S—K-—D calculus, nor does the S—K—D simulate the S—K—E calculus with the 
equality test defined by Eaa—T, and that the S—K—E calculus does not simu- 
late the S—K—F calculus with the rule Faa—a. All of these systems are 
confluent [K180a, Ch81]. A classification of the power of combinatory systems 
should include the A-calculus and the S—K—C (parallel if) calculus with the rules 
CTaB —a, CFaB —B, and CaBB 8, which may be even more powerful than 
S-K-F. 


20. Implementation of the Equation Interpreter 


Implementation of the algorithms, discussed in Section 18, for pattern matching, 
sequencing, and reduction, is a rather well-defined programming task. The design 
and coordination of the algorithmically conventional syntactic processors involved 
in the interpreter constitute the more interesting implementation problems, so that 


aspect is discussed in this section. 


20.1. Basic Structure of the Implementation 


The goals of the interpreter implementation were to determine the practicality of 
the novel aspects of an equational interpreter as a computing engine, and provide 
the facility for preliminary experiments in the usefulness of the equational pro- 
gramming language as a programming language. These two goals are in some 
ways contrary to one another. Preliminary experiments in equational programming 
could be performed well on a very naive interpreter that would execute small pro- 
grams with a combined preprocessing and running time of a few seconds. In order 
to test the practicality of evaluation strategies as sources of computing power, we 
needed to provide better run-time performance than was needed for the program- 
ming experiments, even at the cost of substantial preprocessing work. We decided 
to emphasize the first goal, as long as the preprocessing time could be kept toler- 
able for small programs. This decision led to a two-dimensional structure for the 
interpreter, as shown in Figure 20.1.1. The vertical dimension shows the processing 
of an equational program into tables suitable to drive a fast interpreter. The hor- 
izontal dimension shows an input term being reduced to an output normal form by 


the resulting interpreter. 


Very early experience convinced us that even simple programming experiments 
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Figure 20.1.1 


would be prohibitively difficult without a good syntax for terms. In particular, we 
started out naively with the standard mathematical notation for terms (Standmath 
of Section 4.1). Although this is fine for metanotation, when we used it to write 
equations for a pure LISP interpreter, writing cons (A, cons (B, nil)) instead of the 
special LISP notation (4 B), the result was very difficult to manage. Since pure 
LISP is defined by a reasonably small and simple equational program, we decided 
that the fault was in the notation. At the same time we realized that LISP nota- 
tion is unacceptably clumsy for other problems, such as the lambda calculus. So, 
we decided to separate parsing and other syntactic manipulations completely from 
the semantic essentials of the interpreter, allowing for a library of different syn- 
taxes to be chosen for different problems. 

In this section the semantic essentials of the interpreter and preprocessor are 
called the core programs, and the parsers and other syntactic transformers that 


analyze the input and pretty-print the output are called syntactic shells. We con- 
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centrated the implementation effort on the performance of the core programs, since 
syntactic problems are already rather well understood. Design effort connected 
with the shells went mostly toward flexibility of their interfaces, rather than the 


internals of the parsers, etc. 


We decided that it was important to be able to vary, not only the syntactic 
forms seen by a user, but also the way in which terms are presented to the core 
programs. The issue first arose regarding the efficiency of the pattern matcher -- in 
some cases pattern-matching tables could be much smaller if each term of the form 
f(A, B,C) were Curried into apply (apply (apply (f, A), B), C), reducing the 
arity of symbols to a maximum of 2. In other cases, Currying could be wasteful, 
and there are many other variations in the presentation of terms. In order to allow 
flexibility in choosing an efficient internal form (the inner syntax of Section 4.4), 
while guaranteeing functional equivalence at the user’s level, we separated the syn- 


tactic shell into two levels, shown in Figure 20.1.2 


We use the terminology of transformational grammars [Ch65] to discuss the 
levels of syntactic processing, even though we do not use the transformational for- 
malism to define that processing. The source text produced by a user as input, or 
provided to a user as output, is called concrete syntax. The parsed form of con- 
crete syntax, showing the tree structure inherent in its text, is called surface 
abstract syntax -- from the surface structures of transformational grammars. 
Essentially, the translations between concrete syntax and surface abstract syntax 
are context-free, so a list of variables in the concrete syntax is still a list of vari- 
ables, in the same order, in the surface abstract syntax. Surface abstract syntax is 
transformed in non-context-free ways into deep abstract syntax. In particular, 


declarations of symbols and variables are processed, and each occurrence of a sym- 
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bol is marked with the information in its declaration. This sort of processing is 
called "semantic analysis" in much of the literature on compiling, but we believe it 
is more illuminating to think of it as another component of syntactic analysis. 
Structural transformations, such as Currying, are also performed in the translation 
from surface to deep structure. Future versions of the interpreter may use 
transformations on abstract syntax to implement modular program constructors of 
the sort discussed in Section 14. The uniformity of the representation of abstract 


syntax at different levels allows both sorts of transformations, as well as the non- 
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context-free syntactic processing, to be implemented by equational programs. 


Communication between different portions of the equation interpreter system 
is always done by files of characters. These files are implemented as pipelines 
whenever possible. Except for the input produced by, and the output scen by, a 
user, and the Pascal code produced by the preprocessor for inclusion in the inter- 
preter, all communication follows a standard format for abstract symbolic informa- 
tion, in which an abstract symbol is given by its length, type, and a descriptive 
string of characters. Section 20.1 describes this abstract symbolic format. A 
thorough understanding of symbol format is crucial to the sophisticated user who 


wishes to produce his own syntactic processors. 


20.2. A Format for Abstract Symbolic Information 


Input to, and output from, the cores of the preprocessor and interpreter are always 
presented in a versatile and abstract syntactic form, which is computationally 
trivial to parse. Use of this special form is intended to remove all questions of syn- 
tactic processing from the core programs, both in order to simplify and clarify 
these programs, and to allow great flexibility in manipulating their syntactic shells. 
Because the equation interpreter is used as part of its own syntactic processing, it is 
easy to lose orientation when thinking about symbolic files. The key idea is that 
the format of a symbolic file must always be appropriate to the equational program 


for which it is the immediate input or output. 


The details of the representation of abstract syntax described below, were 
designed for ease of experimentation. Although intended for internal use in com- 
putations, they are readable enough to be useful for debugging. As a result, the 


representations are highly inefficient, taking more space in most cases than the con- 
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crete syntax. A more mature version of the equation interpreter will compress the 
internal representations of abstract syntax, sacrificing readability for debugging 


purposes in favor of efficiency. 


A file of abstract symbols contains a contiguous sequence of abstract symbols, 


with no separators and no terminator. Each abstract symbol is of the form 


length type content 


presented contiguously with no separators. Jength is given as a nonnegative integer 
in normal base-10 notation, with no leading zeroes (except that zero itself is 
presented as "0"). type is a single character, with a meaning described later. con- 
tent is a sequence of characters, the number of characters being exactly the given 
length. The idea for this representation was taken from FORTRAN’s FORMAT 


statements. 


There are 5 types of abstract symbol that are important to the equation inter- 
preter system. The motivation for each of these types refers to the preprocessor or 
interpreter core program for which the symbol is an immediate input or output. A 
symbol that is presented to an interpreter has the same meaning as if it were 
presented to the preprocessor that produced that interpreter. 

M Metasymbol: a symbol with a special predetermined meaning to the system. 
L Literal symbol: a symbol whose meaning is given by the user in his definitions. 
A Atomic symbol: a symbol with no discernible structure or special meaning. 


C Character string: a symbol intended to denote the very sequence of characters 
that is its content. 


I Integer symbol: a symbol intended to denote an integer number. 


T Truth symbol: a symbol denoting a truth value. 
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Examples of abstract symbols are "/M(", "SAabcde", "10Cdefinition", "ITT", "ITF", 
"411111", "41-124". Informally, we will refer to abstract symbols by their contents 


when the type and length are clear from context. 


Metasymbols include left and right parentheses, "(" and ")", and codes for the 
predefined classes of symbols, equations, and operations. Integer symbols are 
presented in normal base-10 notation, with a single preceding minus sign, "-", for 
negative integers. Truth symbols include "7" for truth and "F" for falsehood. The 
other three types of symbol are very easily confused, and require some initial study. 
Character strings are taken by the equation interpreter to which they are present- 
ed, and by the equation interpreter produced by the preprocessor to which they are 
presented, as textual data to be manipulated by concatenation, extraction of sub- 
Strings, etc. The lexical relationships between different character strings may be 
important to the interpreter program that manipulates them. Literal symbols, 
presented to a preprocessor, are intended to be given meanings by the equations 
that the preprocessor is reading; presented to an interpreter, they are intended to 
have the meanings given by the equations from which that interpreter was pro- 
duced. The lexical structure of literal symbols is irrelevant. Atomic symbols are 
opaque symbols with no meanings beyond their identities. The lexical structure of 
atomic symbols is irrelevant to an equational program that is processing them, but 
may become relevant in a later step if some nonequational syntactic transforma- 
tion, such as the content operation of Section 13.2, maps atomic symbols to some 


other types of symbols. 


Either an atomic symbol, or a character string, may have a content with a spe- 
cial meaning to an earlier or later step in a sequence of programs. Thus, "4Acons" 


is an atomic symbol, whose content is "cons". That content could be accessed by a 
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nonequational program at some point to produce the literal symbol "4Zcons". This 
near-brush with confusion is necessary in order for an equational program to be 
part of the syntactic processor for the equation preprocessor. Thus, equations 
defining a syntactic processor may use literal symbols, such as "4Zcons", in order to 
perform syntactic transformations on expressions containing the atomic symbol 
"4Acons", which will later become the literal symbol "4Zcons" when presented to 


the core of the preprocessor. 


The metasymbols that are meaningful to the equation interpreter system are: 


"(",")" used to present the tree-structure of a term as input to or output from the 
preprocessor or the interpreter 


The remaining metasymbols are used only in input to the preprocessor 
"Vy" marks an address of a formal variable 

"U" union of syntactic classes in variable qualification 

"2" the universal syntactic class 

"#" the empty syntactic class 

"A" the class of atomic_symbols 

"I" the class of integer_numerals 

"C" the class of character_strings 

"T" the class of truth_values 


npn nen en 7", "m", "=", "<" the predefined functions, used on right-hand sides 


Text containing the metasymbols "(" and ")" is intended to represent a term in the 
natural way. Such text will often be displayed for informal discussion with com- 
mas, spaces, and indentation to improve readability, although no such notation ap- 


pears inside the machine. 
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20.3. Syntactic Processors and Their Input/Output Forms 


The main goal in the organization of the syntactic processors is versatility. In ad- 
dition to the variations in the external concrete syntax typed and scen by the user, 
described in Section 4, there are variations in the form in which material is 
presented to the core programs. These variations in internal syntax are provided 
because different encodings of terms may have radically different effects on the 
efficiency of the pattern-matching algorithms in the interpreter. The reasons for 
these effects are explained in Section 18.2, and the details of the different internal 


syntaxes are given in Section 4. 


Figure 20.3.1 refines Figure 20.1.2 to show all of the levels of syntactic 
analysis. The configurations for the preprocessor input analyzer, pre.in, the inter- 
preter input analyzer, int.in, and the interpreter output pretty-printer, int.out, are 
essentially the same, except that pre.in has one more step than the others, and 
int.out transforms from inner form into outer form -- the opposite direction from 
pre.in and int.in. In describing the different levels of syntactic processing, we will 
always refer to pre.in, which has the most complex forms. The forms for int.in 


and int.out are merely the term portions of pre.in syntaxes. 


Concrete syntax is defined in Sections 3, 4 and 8. Surface abstract syntax is 
intended to represent the concrete syntax directly, with purely typographical con- 
siderations removed. Surface abstract syntax is in the form: 


sspec(ssymspec(list_of_symspecs), sequspec(list_of variables, list_of_equations)) 


"sspec", "ssymspec", "sequspec", and all other explicitly-given symbols, are literal 


symbols. The lists are represented in the usual way by the literal symbols "cons" 
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and "nil". Elements of the list_of_symspecs are of the two forms: 
usersym(list_of symbols, arity) 
predefsym(list_of_symbols) 

Elements of the /ist_of_symbols and list_of_variables are of the forms: 


litsym(atomic_symbol) 
atomsym(atomic_symbol) 


metasym(atomic_symbol) 


The contents of these atomic_symbols will become literal symbols, atomic symbols, 


and metasymbols respectively when they reach the core program. Arity is of the 


form 


intnum (atomic_symbol) 


The contents of this atomic_symbol will become an integer. Elements of the 


list_of_equations are of the two forms: 


qualequ(term,term,list_of_quals) 


predefequ(list_of symbols) 
Each element of the list_of_quals is of the form: 
qualify (list_of_variables,list_of_terms) 


Terms within qualifiers may include the binary literal operator “qualterm" to intro- 


duce nested qualifications in the form: 


qualterm (term, list_of_quals) 


Qualifiers may be nested. 
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Because the syntactic processors will perform syntactic, rather than semantic, 
manipulations on terms, each term is represented somewhat indirectly by an 
abstract syntax showing its applicative structure, as described in Section 13.2. 


Thus, 


Sf (a,b,c,d) 


is represented by 


multiap[litsym[f]; (a 6 c d)) 


In general, the structure of a specification given by the keywords in concrete syntax 
is translated directly into the structure of the surface abstract syntax term, shown 
by its literal symbols. Tokens other than keywords in the surface concrete syntax, 
including user-defined symbols, symbols intended to indicate predefined classes of 
symbols or equations, integers, character strings, and atomic symbols, are all 
translated into atomic symbols in the surface abstract syntax. All instances of in- 
tegers, character strings, truth values, literals, and metasymbols are marked by the 
literal operators "intnum", "charstr", "truthval", “litsym", and "metasym" as shown 


above. Atomic symbols are not yet marked with "atomsym" because they have not 


yet been distinguished from variables. 


Deep abstract syntax is similar to surface abstract syntax, but information has 
been organized in a form more convenient to the core program than is the surface 
form. In particular, user-defined and predefined symbol declarations are separated, 
arities and similar tags are distributed over lists of symbols, symbols are marked 
appropriately as variables, atomic symbols, integer numerals, character strings, and 
literals. Finally, all qualifications on variables replace the left-hand-side oc- 


currences of variables that they qualify, and right-hand-side occurrences of vari- 
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ables are shown as 
varaddr (list_of_integers) 
where the list_of_integers gives the sequence of tree branches to be followed to 


find the corresponding left-hand-side occurrence of the variable, and equation sche- 


mata are substituted for invocations of predefined classes of equations. 


A specification in deep abstract syntax is of the form: 


dspec(dsymspec(list_of_user_syms, 
list_of_predef_syms 


dequspec(list_of equations 
) 


Elements of the list_of_user_syms are of the form: 


usersym (symbol arity) 


Elements of the list_of_predef_syms are of the form: 


predefsym (symbol) 


Elements of the list_of_equations are of the form: 


equate (term ,term) 


As in the surface abstract syntax, symbols given explicitly above are literal sym- 

bols, symbols taken from the surface concrete syntax are atomic symbols, and 

terms are represented in syntactic form with the operator "multiap". Every atomic 

symbol is now marked in one of the following forms to show its intended type: 
metasym(symbol) 


litsym(symbol) 
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atomsym (symbol) 
charstr (symbol) 
truthval (symbol) 


intnum (symbol) 


In order to accommodate inner syntactic translations, such as Currying, the 
transformation of surface abstract syntax to deep abstract syntax goes in three 
steps. 

1. Declarations of literal symbols and variables are processed, and each oc- 
currence of a symbol is marked by the appropriate tag. A variable x is given 
as qualvar|x; qualifications] if it is on the left-hand side of an equation, and 
simply variable[x] if it is on the right-hand side. 


2. Any inner syntactic transformations, such as Currying, are performed. 


3. Right-hand-side variables are replaced by the corresponding left-hand-side ad- 


dresses, and all variable names are eliminated. 


Notice that this order of work is critical -- syntactic transformations may depend 
on the types of symbols encountered, and variable addresses must be assigned 
based on the transformed versions of the left-hand sides. Immediately before it is 
presented to a core program, each term in deep abstract syntax is transformed by 
the content operation of Section 13.2, to produce the semantically appropriate 


terms without mediation by the multiap symbol. 
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